October 31, 2014 at 12:34 pm
I need to parse xml data using xquery. I have tried my best to replicate the situation. I need to look for similar looking element name and store the value in the same column.The similar elements could go from 1 to n.
Below is the xml and the expected output.
<ProductCategory>1
<Dairy1>Milk</Dairy1>
<Dairy1>yougurt</Dairy1>
<Dairy2>Cheese</Dairy2
<Dairy3>Butter </Dairy3>
<Dairy>CreamCheese </Dairy>
<Fruit>Apple </Fruit>
<Fruit>Banana </Fruit>
<Fruit1>Orange </Fruit1>
<Fruit1>GrapeFruit</Fruit1>
<Fruit1>Garpes</Fruit1>
<Fruit2>Strawberry</Fruit2>
<Vegetable>Onion </Vegetable>
<others>Walnut</others>
</ProductCategory>
ProductCategory Dairy Fruit Vegetable Others
1 Milk apple onion walnut
1 yogurtBanana NULL NULL
1 chesse orange NULL NULL
1butter grapefruit NULL NULL
1creamcheese grapes NULL NULL
1 NULL Strawberry NULL NULL
November 1, 2014 at 3:11 am
Looking at the XML and the expected results, it is apparent that something is missing in the description;-)
The code below is slightly verbose in order to be self explanatory, note that if column names are dynamic then the whole query has to be dynamic.
This should at least get you started.
😎
USE tempdb;
GO
SET NOCOUNT ON;
DECLARE @TXML XML = '<ProductCategory>1
<Dairy1>Milk</Dairy1>
<Dairy1>yougurt</Dairy1>
<Dairy2>Cheese</Dairy2>
<Dairy3>Butter </Dairy3>
<Dairy>CreamCheese </Dairy>
<Fruit>Apple </Fruit>
<Fruit>Banana </Fruit>
<Fruit1>Orange </Fruit1>
<Fruit1>GrapeFruit</Fruit1>
<Fruit1>Garpes</Fruit1>
<Fruit2>Strawberry</Fruit2>
<Vegetable>Onion </Vegetable>
<others>Walnut</others>
</ProductCategory>'
;
;WITH BASE_DATA AS
(
SELECT
ROW_NUMBER() OVER
(
PARTITION BY (SELECT NULL)
ORDER BY (SELECT NULL)
) AS ELEMENT_RID
,PRODUCTCATEGORY.DATA.value('text()[1]', 'VARCHAR(10)') AS PRODCAT_VAL
,ELEMENT.DATA.value('local-name(.)', 'VARCHAR(50)') AS ELEMENT_NAME
,ELEMENT.DATA.value('.[1]', 'VARCHAR(100)') AS ELEMENT_VALUE
FROM @TXML.nodes('ProductCategory') AS PRODUCTCATEGORY(DATA)
OUTER APPLY PRODUCTCATEGORY.DATA.nodes('*') AS ELEMENT(DATA)
)
,MULTI_GROUP AS
(
SELECT
BD.ELEMENT_RID
,BD.ELEMENT_NAME
,BE.ELEMENT_NAME AS GROUP_NAME
FROM BASE_DATA BD
CROSS APPLY BASE_DATA BE
WHERE BD.ELEMENT_NAME <> BE.ELEMENT_NAME
AND BD.ELEMENT_NAME LIKE BE.ELEMENT_NAME + '%'
)
,ALL_GROUPS AS
(
SELECT DISTINCT
BD.ELEMENT_NAME AS GROUP_NAME
FROM BASE_DATA BD
WHERE BD.ELEMENT_NAME NOT IN
(
SELECT
MG.ELEMENT_NAME
FROM MULTI_GROUP MG
)
)
,NUMBERED_GROUPS AS
(
SELECT
ROW_NUMBER() OVER
(
PARTITION BY (SELECT NULL)
ORDER BY (SELECT NULL)
) AS AG_RID
,AG.GROUP_NAME
FROM ALL_GROUPS AG
)
,COL_ROW_ENUM_SET AS
(
SELECT
BD.ELEMENT_RID
,BD.PRODCAT_VAL
,NG.GROUP_NAME
,NG.AG_RID
,ROW_NUMBER() OVER
(
PARTITION BY NG.GROUP_NAME
ORDER BY BD.ELEMENT_RID
) AS EL_GR_RID
,BD.ELEMENT_VALUE
FROM BASE_DATA BD
OUTER APPLY NUMBERED_GROUPS NG
WHERE CHARINDEX(NG.GROUP_NAME, BD.ELEMENT_NAME,1) = 1
)
,ROWS_AND_COLS AS
(
SELECT DISTINCT
NG.AG_RID AS COL_NO
,CRES.EL_GR_RID AS ROW_NO
FROM NUMBERED_GROUPS NG
OUTER APPLY COL_ROW_ENUM_SET CRES
)
SELECT
MAX(CRES.PRODCAT_VAL) AS ProductCategory
,MAX(CASE WHEN RAC.COL_NO = 1 THEN CRES.ELEMENT_VALUE END) AS COL_01
,MAX(CASE WHEN RAC.COL_NO = 2 THEN CRES.ELEMENT_VALUE END) AS COL_02
,MAX(CASE WHEN RAC.COL_NO = 3 THEN CRES.ELEMENT_VALUE END) AS COL_03
,MAX(CASE WHEN RAC.COL_NO = 4 THEN CRES.ELEMENT_VALUE END) AS COL_04
,MAX(CASE WHEN RAC.COL_NO = 5 THEN CRES.ELEMENT_VALUE END) AS COL_05
,MAX(CASE WHEN RAC.COL_NO = 6 THEN CRES.ELEMENT_VALUE END) AS COL_06
FROM ROWS_AND_COLS RAC
LEFT OUTER JOIN COL_ROW_ENUM_SET CRES
ON RAC.COL_NO = CRES.AG_RID
AND RAC.ROW_NO = CRES.EL_GR_RID
GROUP BY RAC.ROW_NO;
Results
ProductCategory COL_01 COL_02 COL_03 COL_04 COL_05 COL_06
--------------- ------------ ----------- -------- -------- -------- -------
1 Milk Apple Walnut Onion NULL NULL
1 yougurt Banana NULL NULL NULL NULL
1 Cheese Orange NULL NULL NULL NULL
1 Butter GrapeFruit NULL NULL NULL NULL
1 CreamCheese Garpes NULL NULL NULL NULL
1 NULL Strawberry NULL NULL NULL NULL
November 7, 2014 at 5:30 am
I know that this thread is a week old and Eirikur has already provided a potential solution but I'd thought I'd add some comments + another solution as well. Firstly the XML you are working with is pretty poor in structure and if that is indeed what you have to work with, I'd look to resolve the schema of that first otherwise you are always going to be fighting the xml with some creative querying. Anyway here is another way of approaching the problem by shredding the xml and then using a pivot. This is too based on some assumptions on the xml schema design.
DECLARE @xml XML
SET @xml = '<ProductCategory>1
<Dairy1>Milk</Dairy1>
<Dairy1>yougurt</Dairy1>
<Dairy2>Cheese</Dairy2>
<Dairy3>Butter </Dairy3>
<Dairy>CreamCheese </Dairy>
<Fruit>Apple </Fruit>
<Fruit>Banana </Fruit>
<Fruit1>Orange </Fruit1>
<Fruit1>GrapeFruit</Fruit1>
<Fruit1>Garpes</Fruit1>
<Fruit2>Strawberry</Fruit2>
<Vegetable>Onion </Vegetable>
<others>Walnut</others>
</ProductCategory>';
WITH x AS
(
SELECT PC.c.value('text()[1]', 'varchar(10)') AS 'ProductCategory'
, d.c.value('upper-case(substring(local-name(.),1,1))', 'char(1)') AS 'ProductType'
, d.c.value('text()[1]', 'varchar(100)') AS 'ProductValue'
FROM @xml.nodes('/ProductCategory') PC(c)
CROSS APPLY PC.c.nodes('*') d(c)
)
SELECT ProductCategory, [D] AS 'Dairy', [F] AS 'Fruit', [V] AS 'Vegtables', [O] AS 'Others'
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY ProductCategory, ProductType ORDER BY (SELECT NULL)) AS RowId, *
FROM x
) AS Src
PIVOT
(
MAX(ProductValue)
FOR ProductType IN ([D], [F], [O], [V])
) AS Pvt
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply