July 7, 2012 at 3:28 pm
Using, SQL2005, I've been given a project that requires shredding some predefined XML. I'm having a hell of a time doing it. I'm new to the XML data type and the various methods.
Given the XML below, what is a practical way to "flatten it", so that I can normalize it into tables:
Account: AccountNum, AccountStatus
Player: AccountNum, PlayerID, FirstName, LastName
PlayerAddresses: AccountNum, PlayerID, AddressType, City, State, Zip
AddressPhones: AccountNum, PlayerID, AddressType, PhoneNumber, PhoneType
declare @myXML XML
select @myXML =
'<AccountDetailsRsp AccountNum="1" AccountStatus="AccountStatus1">
<PlayerInfo PlayerID="1" FirstName="FirstName1" LastName="LastName1">
<AddressList>
<PlayerAddress>
<Address AddressType="primary", City="City1" State="State1" Zip="Zip1">
<FutureUse />
</Address>
<Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" />
<Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" />
</PlayerAddress>
<Address AddressType="billing", City="City1" State="State1" Zip="Zip1">
<FutureUse />
</Address>
<Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" />
<Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" />
</PlayerAddress>
</AddressList>
</PlayerInfo>
</AccountDetailsRsp>'
I've tried using examples from the following two links but am having no success:
http://www.sqlservercentral.com/articles/Miscellaneous/2996/
http://blogs.msdn.com/b/simonince/archive/2009/04/24/flattening-xml-data-in-sql-server.aspx
Thanks!
July 8, 2012 at 8:54 am
I apologize - the XML example I provided was not valid. I attempted to abstract the actual XML because it's much bigger and I wanted to simplify it for this post. But I did a poor job. The following appears to pass a validator:
<AccountDetailsRsp AccountNum="1" AccountStatus="AccountStatus1">
<PlayerInfo PlayerID="1" FirstName="FirstName1" LastName="LastName1">
<AddressList>
<PlayerAddress>
<Address AddressType="primary" City="City1" State="State1" Zip="Zip1"/>
<FutureUse />
<Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" />
<Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" />
</PlayerAddress>
<PlayerAddress>
<Address AddressType="billing" City="City1" State="State1" Zip="Zip1"/>
<FutureUse />
<Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" />
<Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" />
</PlayerAddress>
</AddressList>
</PlayerInfo>
</AccountDetailsRsp>
July 8, 2012 at 3:08 pm
After reading more carefully through this:
http://blogs.msdn.com/b/simonince/archive/2009/04/24/flattening-xml-data-in-sql-server.aspx
I was able to construct the following query, that gets the results in a set that I can then easily normalize. I understand how this works... I'd like to say I could do this again without effort, but it's tedious. In addition I have no clue whether this is efficient. Efficiency isn't a big deal in my case, since my sets will be small and not high volume, but still would be nice to know.
Here is the solution I came up with:
[font="Courier New"]
select distinct
accountRef.value('@AccountNum', 'varchar(10)') AS AccountNum,
accountRef.value('@AccountStatus', 'varchar(20)') AS AccountStatus,
playerRef.value('@PlayerID', 'varchar(10)') as PlayerID,
addressRef.value('@AddressType', 'varchar(10)') as AddressType,
addressRef.value('@City', 'varchar(10)') as City,
addressRef.value('@State', 'varchar(10)') as Zip,
phoneRef.value('@PhoneNumber', 'varchar(15)') as Phone,
phoneRef.value('@PhoneType', 'varchar(15)') as PhoneType
from @myXML.nodes('/AccountDetailsRsp') as account(accountRef) cross apply
accountRef.nodes('//PlayerInfo') as player(playerRef) cross apply
playerRef.nodes('//AddressList/PlayerAddress/Address') as addresses(addressRef) cross apply
playerRef.nodes('//AddressList/PlayerAddress/Phone') as phones(phoneRef)
order by accountNum, AddressType, Phone
AccountNum AccountStatus PlayerID AddressType City Zip Phone PhoneType
---------- -------------------- ---------- ----------- ---------- ---------- --------------- ---------------
1 AccountStatus1 1 billing City1 State1 PhoneNumber1 Type1
1 AccountStatus1 1 billing City1 State1 PhoneNumber2 Type2
1 AccountStatus1 1 primary City1 State1 PhoneNumber1 Type1
1 AccountStatus1 1 primary City1 State1 PhoneNumber2 Type2
(4 row(s) affected)
[/font]
July 9, 2012 at 12:17 am
That actually doesn't work so well. Try it with the following data and see...
<AccountDetailsRsp AccountNum="1" AccountStatus="AccountStatus1">
<PlayerInfo PlayerID="1" FirstName="FirstName1" LastName="LastName1">
<AddressList>
<PlayerAddress>
<Address AddressType="primary" City="City1" State="State1" Zip="Zip1"/>
<FutureUse />
<Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" />
<Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" />
</PlayerAddress>
<PlayerAddress>
<Address AddressType="billing" Zip="Zip2" State="State2" City="City2" />
<FutureUse />
<Phone PhoneNumber="PhoneNumber3" PhoneType="Type3" />
</PlayerAddress>
</AddressList>
</PlayerInfo>
</AccountDetailsRsp>
What I get from that using your code is...
AccountNum AccountStatus PlayerID AddressType City Zip Phone PhoneType
---------- -------------- -------- ----------- ----- ------ ------------ ---------
1 AccountStatus1 1 billing City2 State2 PhoneNumber1 Type1
1 AccountStatus1 1 billing City2 State2 PhoneNumber2 Type2
1 AccountStatus1 1 billing City2 State2 PhoneNumber3 Type3
1 AccountStatus1 1 primary City1 State1 PhoneNumber1 Type1
1 AccountStatus1 1 primary City1 State1 PhoneNumber2 Type2
1 AccountStatus1 1 primary City1 State1 PhoneNumber3 Type3
(6 row(s) affected)
I hope someone else drops in because I don't know enough about XML to fix it.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 9, 2012 at 10:12 am
It's interesting that there isn't more responders... I read into that that everyone hates this XML stuff.
From what I can see, what makes this especially difficult is that for any given player, there are multiple addresses, and for any given address there are multiple phones. However, the only key value for an address (and therefore the way to relate the Phones to the Address) is the AddressType attribute, that is part of the Address node, which is a sibling of the Phone nodes. It would have been much better to include those attributes in the parent PlayerAddress node (and not have the Address node). Then the key AddressType value could have been associated with the Phones. Unfortunately, this is what I've been given to work with.
The only resolution I see is a RBAR solution. First get a result set that includes the attributes for each Player as columns, and a column with the XML fragment for the PlayerAddresses. You can do that with this:
select
accountRef.value('@AccountNum', 'varchar(10)') AS AccountNum,
accountRef.value('@AccountStatus', 'varchar(20)') AS AccountStatus,
playerRef.value('@PlayerID', 'varchar(10)') as PlayerID,
playerRef.value('@FirstName', 'varchar(10)') as FirstName,
playerRef.value('@LastName', 'varchar(10)') as LastName,
playerRef.query('AddressList') as AddressList
into #tempTbl
from @myXML.nodes('/AccountDetailsRsp') as account(accountRef) cross apply
accountRef.nodes('//PlayerInfo') as player(playerRef)
Then, for each player,
1. count the number of PlayerAddress nodes
2. using the row filter, query for each PlayerAddress node individually, parsing the details and storing
I hope not to have to do that... hopefully someone else knows a better way.
July 9, 2012 at 11:25 am
Your query doesn't match the sample data. That said you're on the right track. You just didn't get the cross apply structures quite right.
Assuming you do it this way - you will get the appropriate data joins:
declare @xml xml
set @xml=cast('<AccountDetailsRsp AccountNum="1" AccountStatus="AccountStatus1">
<PlayerInfo PlayerID="1" FirstName="FirstName1" LastName="LastName1"> <AddressList> <PlayerAddress> <Address AddressType="primary" City="City1" State="State1" Zip="Zip1"/> <FutureUse /> <Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" /> <Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" /> </PlayerAddress> <PlayerAddress> <Address AddressType="billing" Zip="Zip2" State="State2" City="City2" /> <FutureUse /> <Phone PhoneNumber="PhoneNumber3" PhoneType="Type3" /> </PlayerAddress> </AddressList> </PlayerInfo>
<PlayerInfo PlayerID="2" FirstName="FirstName1" LastName="LastName1"> <AddressList> <PlayerAddress> <Address AddressType="primary" City="City1" State="State1" Zip="Zip1"/> <FutureUse /> <Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" /> <Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" /> </PlayerAddress> <PlayerAddress> <Address AddressType="billing" Zip="Zip2" State="State2" City="City2" /> <FutureUse /> <Phone PhoneNumber="PhoneNumber3" PhoneType="Type3" /> </PlayerAddress> </AddressList> </PlayerInfo>
</AccountDetailsRsp>' as XML)
selectact.value('(@AccountNum)[1]','Int') account,
player.value('(@PlayerID)[1]','Int') play,
c.value('(Address/@AddressType)[1]','nvarchar(100)') addrtype,
d.value('(@PhoneNumber)[1]','nvarchar(100)') phone
from @xml.nodes('AccountDetailsRsp') account(act) cross apply
act.nodes('PlayerInfo') p(player)
cross apply player.nodes('AddressList/PlayerAddress') x(c)
cross apply c.nodes('Phone') y(d) --this need to be relative to PlayerAddress, NOT above
Notice that I am stopping just shy of going to PlayerAddress within the cross apply, so you can point out that you are joining based on their mutual relation to playerAddress. I could throw in another cross apply if need be but it isn't necessary in this case.
Edit: typoed the node name they both join by.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
July 9, 2012 at 12:08 pm
Ah! I tried something like that, referencing PlayerAddress rather than PlayerAddress/Address and then specifying the path "Address/@AddressType" but I would get an error:
XQuery [value()]: 'value()' requires a singleton (or empty sequence), found operand of type 'xdt:untypedAtomic *'
I did not know what that was, and couldn't figure out from documentation, but I see the [1] filter resolves that. I'll need to study it more to fully understand, but this does in fact work.
Thank you very much!
July 9, 2012 at 3:43 pm
Matt Miller (#4) (7/9/2012)
Your query doesn't match the sample data. That said you're on the right track. You just didn't get the cross apply structures quite right.Assuming you do it this way - you will get the appropriate data joins:
declare @xml xml
set @xml=cast('<AccountDetailsRsp AccountNum="1" AccountStatus="AccountStatus1">
<PlayerInfo PlayerID="1" FirstName="FirstName1" LastName="LastName1"> <AddressList> <PlayerAddress> <Address AddressType="primary" City="City1" State="State1" Zip="Zip1"/> <FutureUse /> <Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" /> <Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" /> </PlayerAddress> <PlayerAddress> <Address AddressType="billing" Zip="Zip2" State="State2" City="City2" /> <FutureUse /> <Phone PhoneNumber="PhoneNumber3" PhoneType="Type3" /> </PlayerAddress> </AddressList> </PlayerInfo>
<PlayerInfo PlayerID="2" FirstName="FirstName1" LastName="LastName1"> <AddressList> <PlayerAddress> <Address AddressType="primary" City="City1" State="State1" Zip="Zip1"/> <FutureUse /> <Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" /> <Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" /> </PlayerAddress> <PlayerAddress> <Address AddressType="billing" Zip="Zip2" State="State2" City="City2" /> <FutureUse /> <Phone PhoneNumber="PhoneNumber3" PhoneType="Type3" /> </PlayerAddress> </AddressList> </PlayerInfo>
</AccountDetailsRsp>' as XML)
selectact.value('(@AccountNum)[1]','Int') account,
player.value('(@PlayerID)[1]','Int') play,
c.value('(Address/@AddressType)[1]','nvarchar(100)') addrtype,
d.value('(@PhoneNumber)[1]','nvarchar(100)') phone
from @xml.nodes('AccountDetailsRsp') account(act) cross apply
act.nodes('PlayerInfo') p(player)
cross apply player.nodes('AddressList/PlayerAddress') x(c)
cross apply c.nodes('Phone') y(d) --this need to be relative to PlayerAddress, NOT above
Notice that I am stopping just shy of going to PlayerAddress within the cross apply, so you can point out that you are joining based on their mutual relation to playerAddress. I could throw in another cross apply if need be but it isn't necessary in this case.
Edit: typoed the node name they both join by.
You have no idea how much this helped me on a "little" automation project I'm working on. I even figured out that OUTER APPLY instead of CROSS APPLY will allow for missing nodes at the "leaf" level. I also was having a heck of a time understanding the stucture of things like this because everyone seems to have their own style and they never really explain the basics. Because of this wonderfully simple yet complex example and actually understanding what the "edge" table looks like behind the scenes (which is part of the reason for my "extra" OUTER APPLYs), I've finally nailed down something that I thought was going to eat up a whole lot more of my spare time.
Thank you BOTH for posting the original question and such nice examples.
Here's what I ended up with, BTW. I just need to document things a bit (all though the naming convention I used pretty much tells it all) and I'm all set for the next phase of my project.
DECLARE @XML XML
SET @XML=CAST('
<AccountDetailsRsp AccountNum="1" AccountStatus="AccountStatus1">
<PlayerInfo PlayerID="1" FirstName="FirstName1" LastName="LastName1">
<AddressList>
<PlayerAddress>
<Address AddressType="primary" City="City1" State="State1" Zip="Zip1"/>
<FutureUse />
<Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" />
<Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" />
</PlayerAddress>
<PlayerAddress>
<Address AddressType="billing" Zip="Zip2" State="State2" City="City2" />
<FutureUse />
<Phone PhoneNumber="PhoneNumber3" PhoneType="Type3" />
</PlayerAddress>
</AddressList>
</PlayerInfo>
<PlayerInfo PlayerID="2" FirstName="FirstName2" LastName="LastName2">
<AddressList>
<PlayerAddress>
<Address AddressType="primary" City="City3" State="State3" Zip="Zip3"/>
<FutureUse />
<Phone PhoneNumber="PhoneNumber4" PhoneType="Type4" />
<Phone PhoneNumber="PhoneNumber5" PhoneType="Type5" />
</PlayerAddress>
<PlayerAddress>
<Address AddressType="billing" Zip="Zip4" State="State4" City="City4" />
<FutureUse />
</PlayerAddress>
</AddressList>
</PlayerInfo>
</AccountDetailsRsp>'
AS XML)
SELECT AccountNum = x1.AccountDetailsRsp.value ('@AccountNum' ,'VARCHAR(28)'),
AccountStatus = x1.AccountDetailsRsp.value ('@AccountStatus' ,'VARCHAR(28)'),
PlayerID = x2.PlayerInfo.value ('@PlayerID' ,'VARCHAR(28)'),
FirstName = x2.PlayerInfo.value ('@FirstName' ,'VARCHAR(28)'),
LastName = x2.PlayerInfo.value ('@LastName' ,'VARCHAR(28)'),
AddressType = x5.Address.value ('@AddressType' ,'VARCHAR(28)'),
City = x5.Address.value ('@City' ,'VARCHAR(28)'),
State = x5.Address.value ('@State' ,'VARCHAR(28)'),
Zip = x5.Address.value ('@Zip' ,'VARCHAR(28)'),
PhoneNumber = x6.Phone.value ('@PhoneNumber' ,'VARCHAR(28)'),
PhoneType = x6.Phone.value ('@PhoneType' ,'VARCHAR(28)')
FROM @XML.nodes ('AccountDetailsRsp') x1 (AccountDetailsRsp)
OUTER APPLY AccountDetailsRsp.nodes ('PlayerInfo') x2 (PlayerInfo)
OUTER APPLY PlayerInfo.nodes ('AddressList') x3 (AddressList)
OUTER APPLY AddressList.nodes ('PlayerAddress') x4 (PlayerAddress)
OUTER APPLY PlayerAddress.nodes ('Address') x5 (Address) --this needs to be relative to PlayerAddress
OUTER APPLY PlayerAddress.nodes ('Phone') x6 (Phone) --this needs to be relative to PlayerAddress
--Jeff Moden
Change is inevitable... Change for the better is not.
July 9, 2012 at 4:29 pm
That's awesome that it happened to help you (and hopefully others) as well. Of course Matt gets credit for the hard part. I am a frequent reader and searcher of this forum, and have gotten plenty of great info from both of you. Including making good use of the famous tally table! I'll try to help someone (with something simple!) for good karma.
July 9, 2012 at 6:53 pm
Woot! Happy that helped you both. Was a somewhat weak explanation, but apparently got the message across.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
July 9, 2012 at 7:41 pm
rice.tx (7/9/2012)
That's awesome that it happened to help you (and hopefully others) as well. Of course Matt gets credit for the hard part. I am a frequent reader and searcher of this forum, and have gotten plenty of great info from both of you. Including making good use of the famous tally table! I'll try to help someone (with something simple!) for good karma.
Thanks for the feedback on the Tally Table! You made my day! And, yeah... if you ever have the chance, "Pass it forward".
--Jeff Moden
Change is inevitable... Change for the better is not.
July 9, 2012 at 7:49 pm
Matt Miller (#4) (7/9/2012)
Woot! Happy that helped you both. Was a somewhat weak explanation, but apparently got the message across.
I couldn't even spell "XML" yesterday. I have, however, been playing with the "edge table" that XML makes behind the scenes and understanding what's in that table helped me a lot. I've been trying to get a simple "all-in-one-output" like this and just haven't been able to do it in such a simple fashion. I never though of using cCAs (Cascading Cross Applys) like this.
When I finally got it working the way I wanted, thanks to the two of you, I was disappointed that if I removed all of the "Phone" nodes, that the whole bloody row would disappear from the output. The vein in my forehead was about to pop when I said "What the heck... let's try OUTER APPLY and see if that works." It actually solved a huge problem that I've been trying (remember, I don't know XML) to solve (missing nodes because the folks on the other end don't know how to make well formed XML).
Matt, my ol' friend and fellow giant killer, thanks a million. The hints you gave on this post about the addition Cross Applys and the fact that one of them couldn't/shouldn't refer to the CA just above it were exactly what I needed to understand the problem.
{EDIT} P.S. No giants were actually harmed in the making of this thread. 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
July 10, 2012 at 10:54 am
BTW, I had never heard of CROSS APPLY OR OUTER APPLY until researching this problem. It is a pretty cool addition. I found some language here that was plain english:
http://www.sqlteam.com/article/using-cross-apply-in-sql-server-2005
The real magic happens when you use SQL Server 2005's new APPLY clause. The APPLY clause let's you join a table to a table-valued-function. That let's you write a query like this:
SELECT C.CustomerID, O.SalesOrderID, O.TotalDue
FROM AdventureWorks.Sales.Customer AS C
CROSS APPLY AdventureWorks.dbo.fn_GetTopOrders(C.CustomerID, 3) AS O
ORDER BY CustomerID ASC, TotalDue DESC
The APPLY clause acts like a JOIN without the ON clause comes in two flavors: CROSS and OUTER. The OUTER APPLY clause returns all the rows on the left side (Customers) whether they return any rows in the table-valued-function or not. The columns that the table-valued-function returns are null if no rows are returned. The CROSS APPLY only returns rows from the left side (Customers) if the table-valued-function returns rows.
Notice that I'm just passing in the CustomerID to the function. It returns the TOP 3 rows based on the amount of the order. Since I'm using CROSS APPLY a customer without orders won't appear in the list. I can also pass in a number other than 3 to easily return a different number of orders per customer. Even better I can pass in a different number of orders for each customer. So I could list the top 5 orders for one type of customer but the top 10 for another type of customer. How cool is that?!?
July 10, 2012 at 2:55 pm
rice.tx (7/10/2012)
BTW, I had never heard of CROSS APPLY OR OUTER APPLY until researching this problem. It is a pretty cool addition. I found some language here that was plain english:http://www.sqlteam.com/article/using-cross-apply-in-sql-server-2005
The real magic happens when you use SQL Server 2005's new APPLY clause. The APPLY clause let's you join a table to a table-valued-function. That let's you write a query like this:
SELECT C.CustomerID, O.SalesOrderID, O.TotalDue
FROM AdventureWorks.Sales.Customer AS C
CROSS APPLY AdventureWorks.dbo.fn_GetTopOrders(C.CustomerID, 3) AS O
ORDER BY CustomerID ASC, TotalDue DESC
The APPLY clause acts like a JOIN without the ON clause comes in two flavors: CROSS and OUTER. The OUTER APPLY clause returns all the rows on the left side (Customers) whether they return any rows in the table-valued-function or not. The columns that the table-valued-function returns are null if no rows are returned. The CROSS APPLY only returns rows from the left side (Customers) if the table-valued-function returns rows.
Notice that I'm just passing in the CustomerID to the function. It returns the TOP 3 rows based on the amount of the order. Since I'm using CROSS APPLY a customer without orders won't appear in the list. I can also pass in a number other than 3 to easily return a different number of orders per customer. Even better I can pass in a different number of orders for each customer. So I could list the top 5 orders for one type of customer but the top 10 for another type of customer. How cool is that?!?
The easiest way for me to think of a cross apply is that it does exactly the same thing as a "correlated subquery" except that it allows for multiple rows in the return.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 10, 2012 at 10:42 pm
Even though I don't really know how to spell XML, I believe I've written a nice fast stored procedure that you can feed an XML document to and it will automatically figure out what the column names are for all elements/attributes are and return the data as a "flat" result set.
Would you be willing to pre-review the code before I write an article about it?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 29 total)
You must be logged in to reply to this topic. Login to reply