Flatten or shredding XML text file into SQL Server table

Question

Post reply

Flatten or shredding XML text file into SQL Server table

acmorenotx

Old Hand

Points: 358
More actions
December 7, 2011 at 10:58 pm

#251763

I have the following XML text file that needs to be flattened or shredded into a SQL Server table:
<ProductFamilyUpdate>
<PriceListId>1111</PriceListId>
<GenerationTime>01:31:33</GenerationTime>
<MajorFamily>
<FamilyName>3G modules and 3G Router Bundles</FamilyName>
<MinorFamily>
<FamilyName>3G Antennas and Cables</FamilyName>
<ProductNumber>3G-AE015-R=</ProductNumber>
</MinorFamily>
<MinorFamily>
<FamilyName>3G Bundles</FamilyName>
<ProductNumber>CISCO881G-A-K9</ProductNumber>
<ProductNumber>CISCO881G-S-K9</ProductNumber>
</MinorFamily>
</MajorFamily>
<MajorFamily>
<FamilyName>CPE Telco Home Networks</FamilyName>
<MinorFamily>
<FamilyName>xDSL Residential Gateway (Telcos)</FamilyName>
<ProductNumber>SP-AR3-RFGW1A2E</ProductNumber>
</MinorFamily>
</MajorFamily>
<MajorFamily>
<FamilyName>Cables and Accessories for All Router and Security Platforms</FamilyName>
<MinorFamily>
<FamilyName>Power Cords</FamilyName>
<ProductNumber>AIR-PWR-CORD-JP=</ProductNumber>
<ProductNumber>AIR-PWR-CORD-NA=</ProductNumber>
</MinorFamily>
</MajorFamily>
<MajorFamily>
<FamilyName>Catalyst 2350 Series</FamilyName>
<MinorFamily>
<FamilyName>TwinGig Converter Module for the Catalyst 2350 Series</FamilyName>
<ProductNumber>CVR-X2-SFP10G=</ProductNumber>
</MinorFamily>
</MajorFamily>
<MajorFamily>
<FamilyName>Catalyst 2900</FamilyName>
<MinorFamily>
<FamilyName>Software Relicenses for Used 2900 Equipment</FamilyName>
<ProductNumber>LL29GL3-IP</ProductNumber>
<ProductNumber>LLC2900SA6EN</ProductNumber>
</MajorFamily>
<MajorFamily>
<FamilyName>Catalyst 2900 Series</FamilyName>
<MinorFamily>
<FamilyName>Catalyst 2900 Series Accessories</FamilyName>
<ProductNumber>STK-RACKMOUNT-1RU=</ProductNumber>
</MinorFamily>
</MajorFamily>
</ProductFamilyUpdate>
The file above has an element structure of:
<ProductFamilyUpdate>
<PriceListId> text contents </PriceListId>
<GenerationTime> text contents </GenerationTime>
<MajorFamily>
<FamilyName> text contents </FamilyName>
<MinorFamily>
<FamilyName> text contents </FamilyName>
<ProductNumber> text contents </ProductNumber>
</MinorFamily>
</MajorFamily>
</ProductFamilyUpdate>
where there is a single instance of the elements <PriceListId> and <GenerationTime>;
and there are no text contents in the (container) elements <MajorFamily> and <MinorFamily>;
and the element name <FamilyName> is repeated in the above container elements.
The file needs to be flattened into a format similar to how Excel would import the XML as a table:
PriceListIdGenerationTimeFamilyNameFamilyName2ProductNumber
111101:31:333G modules and 3G Router Bundles3G Antennas and Cables3G-AE015-R=
111101:31:333G modules and 3G Router Bundles3G BundlesCISCO881G-A-K9
111101:31:333G modules and 3G Router Bundles3G BundlesCISCO881G-S-K9
111101:31:33CPE Telco Home NetworksxDSL Residential Gateway (Telcos)SP-AR3-RFGW1A2E
111101:31:33Cables and Accessories for All Router and Security PlatformsPower CordsAIR-PWR-CORD-JP=
111101:31:33Cables and Accessories for All Router and Security PlatformsPower CordsAIR-PWR-CORD-NA=
111101:31:33Catalyst 2350 SeriesTwinGig Converter Module for the Catalyst 2350 SeriesCVR-X2-SFP10G=
111101:31:33Catalyst 2900Software Relicenses for Used 2900 EquipmentLL29GL3-IP
111101:31:33Catalyst 2900Software Relicenses for Used 2900 EquipmentLLC2900SA6EN
111101:31:33Catalyst 2900 SeriesCatalyst 2900 Series AccessoriesSTK-RACKMOUNT-1RU=
I'm looking for SSMS T-SQL solution utilizing OPENROWSET and CROSS APPLY operators.
Thanks!

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

okbangas SSChampion Points: 11773 More actions · Answer 1

XML.nodes() in combination with XML.value() should do the trick.

Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

Concatenating Row Values in Transact-SQL[/url]

acmorenotx Old Hand Points: 358 More actions · Answer 2

I can return five columns with PriceListId, DocDateTime as nvarchar text and the remaining 3 columns include XML for the /MajorFamily/FamilyName, /MinorFamily/FamilyName, and /MinorFamily/ProductNumber container elements, which need to be further shredded but I'm having a difficult time determining how to do that while working with one table.

Any other ideas?

*
* This query returns five columns with XML in PGroup, PSubGroup and ProductNumber
* which need to be further shredded
*
SELECT X.MPL.value('(PriceListId/text()) [1]', 'NVARCHAR(10)') As PriceListID,
X.MPL.query('ThisDocumentGenerationTime').value('.', 'NVARCHAR(20)') As DocDateTime,
X.MPL.query('MajorFamily/FamilyName') As PGroup,
X.MPL.query('//MinorFamily/FamilyName') As PSubGroup,
X.MPL.query('//MinorFamily/ProductNumber') As ProductNumber
FROM (
SELECT CAST(x AS XML)
FROM OPENROWSET(
BULK '\\psf\Home\Import\1134_Family_New.xml',
SINGLE_BLOB) AS T(x)
) AS T(x)
CROSS APPLY x.nodes('/ProductFamilyUpdate') AS X(MPL);
*
* The APPLY is at the root node /ProductFamilyUpdate
*

Adi Cohn-120898 SSC-Dedicated Points: 33944 More actions · Answer 3

I couldn’t find some of the elements that you wrote about in the XML that was in your message, but my guess is that the problem that you had was because you want to get the value of different elements that have different path in the XML. If I’m correct in my assumption, then one way to do it is to work with the nodes method and get to the lowest level of the needed xml, and then climb back in XML using the value method. The code bellow is based on the XML that you had in your message (by the way it had a missing element it it) and shows how to do it:

declare @x xml

set @x =

'<ProductFamilyUpdate>

<PriceListId>1111</PriceListId>

<GenerationTime>01:31:33</GenerationTime>

<MajorFamily>

<FamilyName>3G modules and 3G Router Bundles</FamilyName>

<MinorFamily>

<FamilyName>3G Antennas and Cables</FamilyName>

<ProductNumber>3G-AE015-R=</ProductNumber>

</MinorFamily>

<MinorFamily>

<FamilyName>3G Bundles</FamilyName>

<ProductNumber>CISCO881G-A-K9</ProductNumber>

<ProductNumber>CISCO881G-S-K9</ProductNumber>

</MinorFamily>

</MajorFamily>

<MajorFamily>

<FamilyName>CPE Telco Home Networks</FamilyName>

<MinorFamily>

<FamilyName>xDSL Residential Gateway (Telcos)</FamilyName>

<ProductNumber>SP-AR3-RFGW1A2E</ProductNumber>

</MinorFamily>

</MajorFamily>

<MajorFamily>

<FamilyName>Cables and Accessories for All Router and Security Platforms</FamilyName>

<MinorFamily>

<FamilyName>Power Cords</FamilyName>

<ProductNumber>AIR-PWR-CORD-JP=</ProductNumber>

<ProductNumber>AIR-PWR-CORD-NA=</ProductNumber>

</MinorFamily>

</MajorFamily>

<MajorFamily>

<FamilyName>Catalyst 2350 Series</FamilyName>

<MinorFamily>

<FamilyName>TwinGig Converter Module for the Catalyst 2350 Series</FamilyName>

<ProductNumber>CVR-X2-SFP10G=</ProductNumber>

</MinorFamily>

</MajorFamily>

<MajorFamily>

<FamilyName>Catalyst 2900</FamilyName>

<MinorFamily>

<FamilyName>Software Relicenses for Used 2900 Equipment</FamilyName>

<ProductNumber>LL29GL3-IP</ProductNumber>

<ProductNumber>LLC2900SA6EN</ProductNumber>

</MinorFamily>

</MajorFamily>

<MajorFamily>

<FamilyName>Catalyst 2900 Series</FamilyName>

<MinorFamily>

<FamilyName>Catalyst 2900 Series Accessories</FamilyName>

<ProductNumber>STK-RACKMOUNT-1RU=</ProductNumber>

</MinorFamily>

</MajorFamily>

</ProductFamilyUpdate>'

select @x

select NewTbl.XMLCol.value('(../../PriceListId)[1]', 'int') as PriceListID,

NewTbl.XMLCol.value('(../../GenerationTime)[1]', 'char(8)'),

NewTbl.XMLCol.value('(../FamilyName)[1]','varchar(50)') as MajorFamilyName,

NewTbl.XMLCol.value('(FamilyName)[1]','varchar(50)') as MinorFamilyName

from @x.nodes('ProductFamilyUpdate/MajorFamily/MinorFamily') NewTbl (XMLCol)

Adi

--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

okbangas SSChampion Points: 11773 More actions · Answer 4

Ah, welcome to the sweet beauty of xml. Sorry I don't have time to write the code for you. You're actually doing things right so far. When you get the XML values for Family and so forth, these are, well... xml. And, you can use exactly the same functions (nodes, query etc) on these columns to further shred. Personally, I prefer to use Common Table Expressions. My code sample on Being nuts about scans shows how this can be done.

Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

Concatenating Row Values in Transact-SQL[/url]

acmorenotx Old Hand Points: 358 More actions · Answer 5

Adi Cohn-120898

select NewTbl.XMLCol.value('(../../PriceListId)[1]', 'int') as PriceListID,
NewTbl.XMLCol.value('(../../GenerationTime)[1]', 'char(8)'),
NewTbl.XMLCol.value('(../FamilyName)[1]','varchar(50)') as MajorFamilyName,
NewTbl.XMLCol.value('(FamilyName)[1]','varchar(50)') as MinorFamilyName
from @x.nodes('ProductFamilyUpdate/MajorFamily/MinorFamily') NewTbl (XMLCol)

Your simply easy, yet elegant solution is very close except that the last node is missing. The structure contains multiple ProductNumber elements under the MinorFamily container element. Your query returns only the first ProductNumber under the MinorFamily.

Also, I totally whiffed on my understanding of the node structure. You have the relative path as (../../PriceListId) even though it is at the top level of the node structure. Can you explain this please?

okbangas SSChampion Points: 11773 More actions · Answer 6

nodes return a virtual node, or if you want, a default path in the XML tree structure. This is most likely NOT the root of the document, and as such it may make sense to retrieve information one level up (../) or even multiple levels up. If you use a path like (/something) after using nodes(), it is essentially as if you did not use nodes, as you override the default path.

Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

Concatenating Row Values in Transact-SQL[/url]

acmorenotx Old Hand Points: 358 More actions · Answer 7

With one change to Adi SQL, I have the solution

SELECT X.MPL.value('(../../PriceListId)[1]', 'nvarchar(4)') As PriceListID,
X.MPL.value('(../../ThisDocumentGenerationTime)[1]', 'nvarchar(20)') As DocDateTime,
X.MPL.value('(../FamilyName)[1]', 'nvarchar(50)') As PGroup,
X.MPL.value('(FamilyName)[1]', 'nvarchar(50)') As PSubGroup,
Y.MPL.value('(.)[1]', 'nvarchar(50)') As ProductNumber
FROM (
SELECT CAST(x AS XML)
FROM OPENROWSET(
BULK '\\psf\Home\Import\1134_Family_New.xml',
SINGLE_BLOB) AS T(x)
) AS T(x)
CROSS APPLY x.nodes('ProductFamilyUpdate/MajorFamily/MinorFamily') AS X(MPL)
CROSS APPLY MPL.nodes('ProductNumber') As Y(MPL);

The additional CROSS APPLY allowed me to shred the last element, ProductNumber, which was contained in the MinorFamily element multiple times.

I'm still not sure why X.MPL.value('(../../PriceListId)[1]', 'nvarchar(4)') As PriceListID is referenced this way in the SELECT. Looking back at the sample of the XML data above, one can see that PriceListId the first node below the root, ProductFamilyUpdate, with ProductNumber as the bottom most node. Yet the "value" method references PriceListId as if it's at the bottom (as I understand it) using (../../PriceListId) and MinorFamily[FamilyName] as if it's at the top using (FamilyName). I would think the reference to PSubGroup would pull the first occurrence of FamilyName, which is under MajorFamily, instead using this reference.

Adi Cohn-120898 SSC-Dedicated Points: 33944 More actions · Answer 8

You have to show us the XML that you work with. In your question you reference an XML file that has some elements that don't exist in the XML that you posted in the original question, so it is hard to answer the question. Can you post an XML with the same structure as your question?

Adi

edited: Sorry, I’ve just noticed that you referenced the name that you gave to the columns that were created with the value method. I apologize and pleas disregard what I wrote.

Adi

--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Adi Cohn-120898 SSC-Dedicated Points: 33944 More actions · Answer 9

I think that what you misunderstand is that the 2 dots and slash (../) symbol is actually for going up one level and not going down one level (like the 2 dots in DOS – cd .. takes up one step in the files tree). The path that you specify in the nodes method is the reference point that from there it will go up.

Adi

--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

okbangas SSChampion Points: 11773 More actions · Answer 10

Indeed, ../ is one level towards the root, or up in the root structure to its parent element. If you start the path with / it starts at the root of the document. If you start with //, for instance //Book, you look for a Book element anywhere in the XML document (or fragment).

Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

Concatenating Row Values in Transact-SQL[/url]