Handling xml data and formatting into separate table

Question

Post reply

Handling xml data and formatting into separate table

s.ravisankar

SSChasing Mays

Points: 647
More actions
April 4, 2016 at 9:22 am

#306104

I have xml data in table, here xml data looks mentioned below,
And it’s taken from different table for each row
TableA
Tab_idxml_data
1xml_data1 -- eg: Table 1
2xml_data2 -- eg: Table 2
xml_data1
<row>
<col0> EMP_ID </col0> --COLUMN NAMES FOR DATA
<col1> EMP_NAME </col1>--COLUMN NAMES FOR DATA
<col2> EMP_COUNTRY </col2>--COLUMN NAMES FOR DATA
</row>
<row>
<col0> 1001 </col0>
<col1> AAA </col1>
<col2> USA </col2>
</row>
<row>
<col0> 1002 </col0>
<col1> BBB </col1>
<col2> CANADA </col2>
</row>
xml_data2
<row>
<col1> HR_ID </col1>--COLUMN NAMES FOR DATA
<col2> HR_NAME </col2>--COLUMN NAMES FOR DATA
<col3> HR_ADDRESS </col3>--COLUMN NAMES FOR DATA
<col4> HR_CITY </col4>--COLUMN NAMES FOR DATA
</row>
<row>
<col1> 100 </col1>
<col2> GATES </col2>
<col3> MS </col3>
<col4> WASHINGTON </col4>
</row>
<row>
<col1> 101 </col1>
<col2> MALLA </col2>
<col3> KF </col3>
<col4> NEWYORK </col4>
</row>
Actual table View of xml_data1 will be
EMP_IDEMP_NAMEEMP_COUNTRY
1001AAAUSA
1002BBBCANADA
Actual table View of xml_data2 will be
HR_IDHR_NAMEHR_ADDRESSHR_CITY
100GATESMSWASHINGTON
101MALLAKFNEWYORK
Here I need an output like
Table name: TableA_Columns
Tab_IdColumn_IdColumn_NamePosition
11111EMP_ID1
11112EMP_NAME2
11113EMP_COUNTRY3
22221HR_ID1
22222HR_NAME2
22223HR_ADDRESS3
22224HR_CITY4
Table name: TableA_Col_Values
Tab_IdColumn_IdValue_IdValueRow_No
11111100000110011
111121000002AAA1
111131000003USA1
11111100000410022
111121000005BBB2
111131000006CANADA2
2222110000071001
222221000008GATES1
222231000009MS1
222241000010WASHINGTON1
2222110000111011
222221000012MALLA1
222231000013KF1
222241000014NEWYORK1
Please comment if u need any clarification.
Thanks a lot in Advance.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182912 More actions · Answer 1

Here is a quick suggestion towards a solution, let us know if you need more assistance.

😎

USE tempdb;

GO

SET NOCOUNT ON;

DECLARE @XQUERY_STR NVARCHAR(MAX) = N'';

DECLARE @PARAM_STR NVARCHAR(MAX) = N'@TXML1 XML';

DECLARE @TXML1 XML = N'<row>

<col0> EMP_ID </col0> --COLUMN NAMES FOR DATA

<col1> EMP_NAME </col1>--COLUMN NAMES FOR DATA

<col2> EMP_COUNTRY </col2>--COLUMN NAMES FOR DATA

</row>

<row>

<col0> 1001 </col0>

<col1> AAA </col1>

<col2> USA </col2>

</row>

<row>

<col0> 1002 </col0>

<col1> BBB </col1>

<col2> CANADA </col2>

</row>';

-- FIRST ROW CONTAINS THE COLUMN NAMES, LIMITING BASE_DATA TO RID = 1

;WITH BASE_DATA AS

(

SELECT

ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RRID

,XROW.DATA.query('*') AS RXML

FROM @TXML1.nodes('//row') AS XROW(DATA)

)

SELECT @XQUERY_STR = N';WITH BASE_DATA AS

(

SELECT

ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RRID

,XROW.DATA.query(''*'') AS RXML

FROM @TXML1.nodes(''//row'') AS XROW(DATA)

)

SELECT

' +

STUFF(

(SELECT

N',' + UPPER(BDXML.DATA.value('local-name(.)','varchar(100)')) + N'.DATA.value(''(text())[1]'',''varchar(100)'') AS '

+ BDXML.DATA.value('(text())[1]','varchar(100)') + N'

'

FROM BASE_DATA BD

CROSS APPLY BD.RXML.nodes('/*') AS BDXML(DATA)

WHERE BD.RRID = 1

FOR XML PATH(''),TYPE

).value('(./text())[1]','VARCHAR(MAX)'),1,1,'')

+ N'FROM BASE_DATA BD

' +

STUFF(

(SELECT

N' CROSS APPLY BD.RXML.nodes(''' + BDXML.DATA.value('local-name(.)','varchar(100)') + N''') AS ' + UPPER(BDXML.DATA.value('local-name(.)','varchar(100)')) + N'(DATA)

'

FROM BASE_DATA BD

CROSS APPLY BD.RXML.nodes('/*') AS BDXML(DATA)

WHERE BD.RRID = 1

FOR XML PATH(''),TYPE

).value('(./text())[1]','VARCHAR(MAX)'),1,1,'') +

N'

WHERE BD.RRID > 1

OPTION (MAXDOP 1);

'

EXEC SP_EXECUTESQL @XQUERY_STR,@PARAM_STR,@TXML1;

Output

EMP_ID EMP_NAME EMP_COUNTRY

------- --------- ------------

1001 AAA USA

1002 BBB CANADA

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 2

First and by FAR the most important let me say that this should be done OUTSIDE of SQL Server (unless you are going to be using the set in another query on the same box immediately).

Having said that, a quick Binoogle search for sql server xml data to table will find some very useful and appropriate links.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Eirikur Eiriksson SSC Guru Points: 182912 More actions · Answer 3

TheSQLGuru (4/4/2016)
First and by FAR the most important let me say that this should be done OUTSIDE of SQL Server (unless you are going to be using the set in another query on the same box immediately).
Having said that, a quick Binoogle search for sql server xml data to table will find some very useful and appropriate links.

Not too hard to handle in SQL Server and performance is OK if the sets are small, wouldn't recommend handling large / huge sets in SQL server though.

😎

Edit: Just tested 10.000 row elements on my old laptop, finished in less than a second.

drew.allen SSC Guru Points: 76987 More actions · Answer 4

Eirikur Eiriksson (4/4/2016)
Here is a quick suggestion towards a solution, let us know if you need more assistance.
😎
USE tempdb;
GO
SET NOCOUNT ON;
DECLARE @XQUERY_STR NVARCHAR(MAX) = N'';
DECLARE @PARAM_STR NVARCHAR(MAX) = N'@TXML1 XML';
DECLARE @TXML1 XML = N'<row>
<col0> EMP_ID </col0> --COLUMN NAMES FOR DATA
<col1> EMP_NAME </col1>--COLUMN NAMES FOR DATA
<col2> EMP_COUNTRY </col2>--COLUMN NAMES FOR DATA
</row>
<row>
<col0> 1001 </col0>
<col1> AAA </col1>
<col2> USA </col2>
</row>
<row>
<col0> 1002 </col0>
<col1> BBB </col1>
<col2> CANADA </col2>
</row>';
-- FIRST ROW CONTAINS THE COLUMN NAMES, LIMITING BASE_DATA TO RID = 1
;WITH BASE_DATA AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RRID
,XROW.DATA.query('*') AS RXML
FROM @TXML1.nodes('//row') AS XROW(DATA)
)
SELECT @XQUERY_STR = N';WITH BASE_DATA AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RRID
,XROW.DATA.query(''*'') AS RXML
FROM @TXML1.nodes(''//row'') AS XROW(DATA)
)
SELECT
' +
STUFF(
(SELECT
N',' + UPPER(BDXML.DATA.value('local-name(.)','varchar(100)')) + N'.DATA.value(''(text())[1]'',''varchar(100)'') AS '
+ BDXML.DATA.value('(text())[1]','varchar(100)') + N'
'
FROM BASE_DATA BD
CROSS APPLY BD.RXML.nodes('/*') AS BDXML(DATA)
WHERE BD.RRID = 1
FOR XML PATH(''),TYPE
).value('(./text())[1]','VARCHAR(MAX)'),1,1,'')
+ N'FROM BASE_DATA BD
' +
STUFF(
(SELECT
N' CROSS APPLY BD.RXML.nodes(''' + BDXML.DATA.value('local-name(.)','varchar(100)') + N''') AS ' + UPPER(BDXML.DATA.value('local-name(.)','varchar(100)')) + N'(DATA)
'
FROM BASE_DATA BD
CROSS APPLY BD.RXML.nodes('/*') AS BDXML(DATA)
WHERE BD.RRID = 1
FOR XML PATH(''),TYPE
).value('(./text())[1]','VARCHAR(MAX)'),1,1,'') +
N'
WHERE BD.RRID > 1
OPTION (MAXDOP 1);
'
EXEC SP_EXECUTESQL @XQUERY_STR,@PARAM_STR,@TXML1;
Output
EMP_ID EMP_NAME EMP_COUNTRY
------- --------- ------------
1001 AAA USA
1002 BBB CANADA

This does not match the expected output and you don't need to use dynamic SQL to get the expected output. I've only given the beginnings of the output, because the source of the column_id hasn't been specified. (I can make an educated guess, but I'll leave it up to the OP to try figure it out on their own.)

DECLARE @t TABLE(tab_id INT, xml_data XML)

INSERT @t(tab_id, xml_data)

SELECT v.tab_id, v.xml_data

FROM (

VALUES(

1, CAST('<row>

<col0> EMP_ID </col0> --COLUMN NAMES FOR DATA

<col1> EMP_NAME </col1>--COLUMN NAMES FOR DATA

<col2> EMP_COUNTRY </col2>--COLUMN NAMES FOR DATA

</row>

<row>

<col0> 1001 </col0>

<col1> AAA </col1>

<col2> USA </col2>

</row>

<row>

<col0> 1002 </col0>

<col1> BBB </col1>

<col2> CANADA </col2>

</row>

' AS XML)),

(2, '<row>

<col1> HR_ID </col1>--COLUMN NAMES FOR DATA

<col2> HR_NAME </col2>--COLUMN NAMES FOR DATA

<col3> HR_ADDRESS </col3>--COLUMN NAMES FOR DATA

<col4> HR_CITY </col4>--COLUMN NAMES FOR DATA

</row>

<row>

<col1> 100 </col1>

<col2> GATES </col2>

<col3> MS </col3>

<col4> WASHINGTON </col4>

</row>

<row>

<col1> 101 </col1>

<col2> MALLA </col2>

<col3> KF </col3>

<col4> NEWYORK </col4>

</row>')

) v(tab_id, xml_data)

SELECT v.*, t.c.value('text()[1]', 'NVARCHAR(15)'), ROW_NUMBER() OVER(PARTITION BY v.tab_id ORDER BY t.c)

FROM @t AS v

CROSS APPLY v.xml_data.nodes('row[1]/*') AS t(c)

SELECT v.*, c.c.value('text()[1]', 'NVARCHAR(15)'),

DENSE_RANK() OVER(PARTITION BY v.tab_id ORDER BY r.c),

ROW_NUMBER() OVER(PARTITION BY v.tab_id, r.c ORDER BY c.c)

FROM @t AS v

CROSS APPLY v.xml_data.nodes('row[position()>1]') AS r(c)

CROSS APPLY r.c.nodes('./*') AS c(c)

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 5

Eirikur Eiriksson (4/4/2016)
Edit: Just tested 10.000 row elements on my old laptop, finished in less than a second.

If all the rows were identical, then stats will only have one row in it and XML is smart enough to know that and only process the row once as a "run time constant". Such "grooved" data is the same reason why many people think that an XML splitter will beat any and all Tally Table style splitters. I'm currently writing an article on this all too common performance testing problem.

If the rows had the equivalent of a "unique" cardinality for each row in the 10,000, then ok.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eirikur Eiriksson SSC Guru Points: 182912 More actions · Answer 6

Jeff Moden (4/4/2016)
Eirikur Eiriksson (4/4/2016)
Edit: Just tested 10.000 row elements on my old laptop, finished in less than a second.
If all the rows were identical, then stats will only have one row in it and XML is smart enough to know that and only process the row once as a "run time constant". Such "grooved" data is the same reason why many people think that an XML splitter will beat any and all Tally Table style splitters. I'm currently writing an article on this all too common performance testing problem.
If the rows had the equivalent of a "unique" cardinality for each row in the 10,000, then ok.

10,000 rows in a second is not particularly good performance but not bad either if one is dynamically constructing a table from text/xml. My point is that there are limits to such solutions.

😎

On the XML splitter, have you tested combination of the two (8K and XML) by diverting the each based on the length?

Eirikur Eiriksson SSC Guru Points: 182912 More actions · Answer 7

drew.allen (4/4/2016)
Eirikur Eiriksson (4/4/2016)
Here is a quick suggestion towards a solution, let us know if you need more assistance.
😎
USE tempdb;
GO
SET NOCOUNT ON;
DECLARE @XQUERY_STR NVARCHAR(MAX) = N'';
DECLARE @PARAM_STR NVARCHAR(MAX) = N'@TXML1 XML';
DECLARE @TXML1 XML = N'<row>
<col0> EMP_ID </col0> --COLUMN NAMES FOR DATA
<col1> EMP_NAME </col1>--COLUMN NAMES FOR DATA
<col2> EMP_COUNTRY </col2>--COLUMN NAMES FOR DATA
</row>
<row>
<col0> 1001 </col0>
<col1> AAA </col1>
<col2> USA </col2>
</row>
<row>
<col0> 1002 </col0>
<col1> BBB </col1>
<col2> CANADA </col2>
</row>';
-- FIRST ROW CONTAINS THE COLUMN NAMES, LIMITING BASE_DATA TO RID = 1
;WITH BASE_DATA AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RRID
,XROW.DATA.query('*') AS RXML
FROM @TXML1.nodes('//row') AS XROW(DATA)
)
SELECT @XQUERY_STR = N';WITH BASE_DATA AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RRID
,XROW.DATA.query(''*'') AS RXML
FROM @TXML1.nodes(''//row'') AS XROW(DATA)
)
SELECT
' +
STUFF(
(SELECT
N',' + UPPER(BDXML.DATA.value('local-name(.)','varchar(100)')) + N'.DATA.value(''(text())[1]'',''varchar(100)'') AS '
+ BDXML.DATA.value('(text())[1]','varchar(100)') + N'
'
FROM BASE_DATA BD
CROSS APPLY BD.RXML.nodes('/*') AS BDXML(DATA)
WHERE BD.RRID = 1
FOR XML PATH(''),TYPE
).value('(./text())[1]','VARCHAR(MAX)'),1,1,'')
+ N'FROM BASE_DATA BD
' +
STUFF(
(SELECT
N' CROSS APPLY BD.RXML.nodes(''' + BDXML.DATA.value('local-name(.)','varchar(100)') + N''') AS ' + UPPER(BDXML.DATA.value('local-name(.)','varchar(100)')) + N'(DATA)
'
FROM BASE_DATA BD
CROSS APPLY BD.RXML.nodes('/*') AS BDXML(DATA)
WHERE BD.RRID = 1
FOR XML PATH(''),TYPE
).value('(./text())[1]','VARCHAR(MAX)'),1,1,'') +
N'
WHERE BD.RRID > 1
OPTION (MAXDOP 1);
'
EXEC SP_EXECUTESQL @XQUERY_STR,@PARAM_STR,@TXML1;
Output
EMP_ID EMP_NAME EMP_COUNTRY
------- --------- ------------
1001 AAA USA
1002 BBB CANADA
This does not match the expected output and you don't need to use dynamic SQL to get the expected output. I've only given the beginnings of the output, because the source of the column_id hasn't been specified. (I can make an educated guess, but I'll leave it up to the OP to try figure it out on their own.)
DECLARE @t TABLE(tab_id INT, xml_data XML)
INSERT @t(tab_id, xml_data)
SELECT v.tab_id, v.xml_data
FROM (
VALUES(
1, CAST('<row>
<col0> EMP_ID </col0> --COLUMN NAMES FOR DATA
<col1> EMP_NAME </col1>--COLUMN NAMES FOR DATA
<col2> EMP_COUNTRY </col2>--COLUMN NAMES FOR DATA
</row>
<row>
<col0> 1001 </col0>
<col1> AAA </col1>
<col2> USA </col2>
</row>
<row>
<col0> 1002 </col0>
<col1> BBB </col1>
<col2> CANADA </col2>
</row>
' AS XML)),
(2, '<row>
<col1> HR_ID </col1>--COLUMN NAMES FOR DATA
<col2> HR_NAME </col2>--COLUMN NAMES FOR DATA
<col3> HR_ADDRESS </col3>--COLUMN NAMES FOR DATA
<col4> HR_CITY </col4>--COLUMN NAMES FOR DATA
</row>
<row>
<col1> 100 </col1>
<col2> GATES </col2>
<col3> MS </col3>
<col4> WASHINGTON </col4>
</row>
<row>
<col1> 101 </col1>
<col2> MALLA </col2>
<col3> KF </col3>
<col4> NEWYORK </col4>
</row>')
) v(tab_id, xml_data)
SELECT v.*, t.c.value('text()[1]', 'NVARCHAR(15)'), ROW_NUMBER() OVER(PARTITION BY v.tab_id ORDER BY t.c)
FROM @t AS v
CROSS APPLY v.xml_data.nodes('row[1]/*') AS t(c)
SELECT v.*, c.c.value('text()[1]', 'NVARCHAR(15)'),
DENSE_RANK() OVER(PARTITION BY v.tab_id ORDER BY r.c),
ROW_NUMBER() OVER(PARTITION BY v.tab_id, r.c ORDER BY c.c)
FROM @t AS v
CROSS APPLY v.xml_data.nodes('row[position()>1]') AS r(c)
CROSS APPLY r.c.nodes('./*') AS c(c)
Drew

As I posted earlier, "suggestion towards a solution", in fact the desired output is the output of the first CTE without the filter

😎

Expect the OP to read and ask further questions if needed.

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 8

Eirikur Eiriksson (4/4/2016)
Jeff Moden (4/4/2016)
Eirikur Eiriksson (4/4/2016)
Edit: Just tested 10.000 row elements on my old laptop, finished in less than a second.
If all the rows were identical, then stats will only have one row in it and XML is smart enough to know that and only process the row once as a "run time constant". Such "grooved" data is the same reason why many people think that an XML splitter will beat any and all Tally Table style splitters. I'm currently writing an article on this all too common performance testing problem.
If the rows had the equivalent of a "unique" cardinality for each row in the 10,000, then ok.
10,000 rows in a second is not particularly good performance but not bad either if one is dynamically constructing a table from text/xml. My point is that there are limits to such solutions.
😎
On the XML splitter, have you tested combination of the two (8K and XML) by diverting the each based on the length?

What I worry about is when 10,000 rows appear to have a 1 second performance and then you put it in production and BOOM! The example I use in my presentation works 100K rows of "grooved" data in just 3.3 seconds. With "real" data with some good cardinality according to stats, the same code takes over 8 minutes for the same length of data and same number of elements.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eirikur Eiriksson SSC Guru Points: 182912 More actions · Answer 9

Jeff Moden (4/4/2016)
Eirikur Eiriksson (4/4/2016)
Jeff Moden (4/4/2016)
Eirikur Eiriksson (4/4/2016)
Edit: Just tested 10.000 row elements on my old laptop, finished in less than a second.
If all the rows were identical, then stats will only have one row in it and XML is smart enough to know that and only process the row once as a "run time constant". Such "grooved" data is the same reason why many people think that an XML splitter will beat any and all Tally Table style splitters. I'm currently writing an article on this all too common performance testing problem.
If the rows had the equivalent of a "unique" cardinality for each row in the 10,000, then ok.
10,000 rows in a second is not particularly good performance but not bad either if one is dynamically constructing a table from text/xml. My point is that there are limits to such solutions.
😎
On the XML splitter, have you tested combination of the two (8K and XML) by diverting the each based on the length?
What I worry about is when 10,000 rows appear to have a 1 second performance and then you put it in production and BOOM! The example I use in my presentation works 100K rows of "grooved" data in just 3.3 seconds. With "real" data with some good cardinality according to stats, the same code takes over 8 minutes for the same length of data and same number of elements.

I second the concern here and looking forward to read your article.

😎

drew.allen SSC Guru Points: 76987 More actions · Answer 10

Eirikur Eiriksson (4/4/2016)
As I posted earlier, "suggestion towards a solution", in fact the desired output is the output of the first CTE without the filter

No, the OP is asking for data in an EAV format, and at no point do you return data in an EAV format. Just because you would never want to return the data in an EAV format doesn't mean that the OP doesn't have a legitimate reason for returning the data in that format.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Eirikur Eiriksson SSC Guru Points: 182912 More actions · Answer 11

drew.allen (4/4/2016)
Eirikur Eiriksson (4/4/2016)
As I posted earlier, "suggestion towards a solution", in fact the desired output is the output of the first CTE without the filter
No, the OP is asking for data in an EAV format, and at no point do you return data in an EAV format. Just because you would never want to return the data in an EAV format doesn't mean that the OP doesn't have a legitimate reason for returning the data in that format.
Drew

Not certain what you are going on about here, the first CTE does return the row elements as xml, clear example of how to EAV those in the code

😎

Not trying to feed anyone but trying to teach them how to catch a fish.

drew.allen SSC Guru Points: 76987 More actions · Answer 12

Eirikur Eiriksson (4/4/2016)
drew.allen (4/4/2016)
Eirikur Eiriksson (4/4/2016)
As I posted earlier, "suggestion towards a solution", in fact the desired output is the output of the first CTE without the filter
No, the OP is asking for data in an EAV format, and at no point do you return data in an EAV format. Just because you would never want to return the data in an EAV format doesn't mean that the OP doesn't have a legitimate reason for returning the data in that format.
Drew
Not certain what you are going on about here, the first CTE does return the row elements as xml, clear example of how to EAV those in the code
😎
Not trying to feed anyone but trying to teach them how to catch a fish.

Let me put it another way. You are returning data in a pivoted format when the OP asked for data in an unpivoted format.

Each row contains xml for different tables with differing numbers of columns and differing column names. With the unpivoted approach, it's easy to combine the different tables in the same result set, but it's much more difficult to do that with the pivoted approach.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA