April 4, 2016 at 9:22 am
I have xml data in table, here xml data looks mentioned below,
And itβs taken from different table for each row
TableA
Tab_idxml_data
1xml_data1 -- eg: Table 1
2xml_data2 -- eg: Table 2
xml_data1
<row>
<col0> EMP_ID </col0> --COLUMN NAMES FOR DATA
<col1> EMP_NAME </col1>--COLUMN NAMES FOR DATA
<col2> EMP_COUNTRY </col2>--COLUMN NAMES FOR DATA
</row>
<row>
<col0> 1001 </col0>
<col1> AAA </col1>
<col2> USA </col2>
</row>
<row>
<col0> 1002 </col0>
<col1> BBB </col1>
<col2> CANADA </col2>
</row>
xml_data2
<row>
<col1> HR_ID </col1>--COLUMN NAMES FOR DATA
<col2> HR_NAME </col2>--COLUMN NAMES FOR DATA
<col3> HR_ADDRESS </col3>--COLUMN NAMES FOR DATA
<col4> HR_CITY </col4>--COLUMN NAMES FOR DATA
</row>
<row>
<col1> 100 </col1>
<col2> GATES </col2>
<col3> MS </col3>
<col4> WASHINGTON </col4>
</row>
<row>
<col1> 101 </col1>
<col2> MALLA </col2>
<col3> KF </col3>
<col4> NEWYORK </col4>
</row>
Actual table View of xml_data1 will be
EMP_IDEMP_NAMEEMP_COUNTRY
1001AAAUSA
1002BBBCANADA
Actual table View of xml_data2 will be
HR_IDHR_NAMEHR_ADDRESSHR_CITY
100GATESMSWASHINGTON
101MALLAKFNEWYORK
Here I need an output like
Table name: TableA_Columns
Tab_IdColumn_IdColumn_NamePosition
11111EMP_ID1
11112EMP_NAME2
11113EMP_COUNTRY3
22221HR_ID1
22222HR_NAME2
22223HR_ADDRESS3
22224HR_CITY4
Table name: TableA_Col_Values
Tab_IdColumn_IdValue_IdValueRow_No
11111100000110011
111121000002AAA1
111131000003USA1
11111100000410022
111121000005BBB2
111131000006CANADA2
2222110000071001
222221000008GATES1
222231000009MS1
222241000010WASHINGTON1
2222110000111011
222221000012MALLA1
222231000013KF1
222241000014NEWYORK1
Please comment if u need any clarification.
Thanks a lot in Advance.
April 4, 2016 at 10:55 am
Here is a quick suggestion towards a solution, let us know if you need more assistance.
π
USE tempdb;
GO
SET NOCOUNT ON;
DECLARE @XQUERY_STR NVARCHAR(MAX) = N'';
DECLARE @PARAM_STR NVARCHAR(MAX) = N'@TXML1 XML';
DECLARE @TXML1 XML = N'<row>
<col0> EMP_ID </col0> --COLUMN NAMES FOR DATA
<col1> EMP_NAME </col1>--COLUMN NAMES FOR DATA
<col2> EMP_COUNTRY </col2>--COLUMN NAMES FOR DATA
</row>
<row>
<col0> 1001 </col0>
<col1> AAA </col1>
<col2> USA </col2>
</row>
<row>
<col0> 1002 </col0>
<col1> BBB </col1>
<col2> CANADA </col2>
</row>';
-- FIRST ROW CONTAINS THE COLUMN NAMES, LIMITING BASE_DATA TO RID = 1
;WITH BASE_DATA AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RRID
,XROW.DATA.query('*') AS RXML
FROM @TXML1.nodes('//row') AS XROW(DATA)
)
SELECT @XQUERY_STR = N';WITH BASE_DATA AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RRID
,XROW.DATA.query(''*'') AS RXML
FROM @TXML1.nodes(''//row'') AS XROW(DATA)
)
SELECT
' +
STUFF(
(SELECT
N',' + UPPER(BDXML.DATA.value('local-name(.)','varchar(100)')) + N'.DATA.value(''(text())[1]'',''varchar(100)'') AS '
+ BDXML.DATA.value('(text())[1]','varchar(100)') + N'
'
FROM BASE_DATA BD
CROSS APPLY BD.RXML.nodes('/*') AS BDXML(DATA)
WHERE BD.RRID = 1
FOR XML PATH(''),TYPE
).value('(./text())[1]','VARCHAR(MAX)'),1,1,'')
+ N'FROM BASE_DATA BD
' +
STUFF(
(SELECT
N' CROSS APPLY BD.RXML.nodes(''' + BDXML.DATA.value('local-name(.)','varchar(100)') + N''') AS ' + UPPER(BDXML.DATA.value('local-name(.)','varchar(100)')) + N'(DATA)
'
FROM BASE_DATA BD
CROSS APPLY BD.RXML.nodes('/*') AS BDXML(DATA)
WHERE BD.RRID = 1
FOR XML PATH(''),TYPE
).value('(./text())[1]','VARCHAR(MAX)'),1,1,'') +
N'
WHERE BD.RRID > 1
OPTION (MAXDOP 1);
'
EXEC SP_EXECUTESQL @XQUERY_STR,@PARAM_STR,@TXML1;
Output
EMP_ID EMP_NAME EMP_COUNTRY
------- --------- ------------
1001 AAA USA
1002 BBB CANADA
April 4, 2016 at 10:57 am
First and by FAR the most important let me say that this should be done OUTSIDE of SQL Server (unless you are going to be using the set in another query on the same box immediately).
Having said that, a quick Binoogle search for sql server xml data to table will find some very useful and appropriate links.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
April 4, 2016 at 11:12 am
TheSQLGuru (4/4/2016)
First and by FAR the most important let me say that this should be done OUTSIDE of SQL Server (unless you are going to be using the set in another query on the same box immediately).Having said that, a quick Binoogle search for sql server xml data to table will find some very useful and appropriate links.
Not too hard to handle in SQL Server and performance is OK if the sets are small, wouldn't recommend handling large / huge sets in SQL server though.
π
Edit: Just tested 10.000 row elements on my old laptop, finished in less than a second.
April 4, 2016 at 12:56 pm
Eirikur Eiriksson (4/4/2016)
Here is a quick suggestion towards a solution, let us know if you need more assistance.π
USE tempdb;
GO
SET NOCOUNT ON;
DECLARE @XQUERY_STR NVARCHAR(MAX) = N'';
DECLARE @PARAM_STR NVARCHAR(MAX) = N'@TXML1 XML';
DECLARE @TXML1 XML = N'<row>
<col0> EMP_ID </col0> --COLUMN NAMES FOR DATA
<col1> EMP_NAME </col1>--COLUMN NAMES FOR DATA
<col2> EMP_COUNTRY </col2>--COLUMN NAMES FOR DATA
</row>
<row>
<col0> 1001 </col0>
<col1> AAA </col1>
<col2> USA </col2>
</row>
<row>
<col0> 1002 </col0>
<col1> BBB </col1>
<col2> CANADA </col2>
</row>';
-- FIRST ROW CONTAINS THE COLUMN NAMES, LIMITING BASE_DATA TO RID = 1
;WITH BASE_DATA AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RRID
,XROW.DATA.query('*') AS RXML
FROM @TXML1.nodes('//row') AS XROW(DATA)
)
SELECT @XQUERY_STR = N';WITH BASE_DATA AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RRID
,XROW.DATA.query(''*'') AS RXML
FROM @TXML1.nodes(''//row'') AS XROW(DATA)
)
SELECT
' +
STUFF(
(SELECT
N',' + UPPER(BDXML.DATA.value('local-name(.)','varchar(100)')) + N'.DATA.value(''(text())[1]'',''varchar(100)'') AS '
+ BDXML.DATA.value('(text())[1]','varchar(100)') + N'
'
FROM BASE_DATA BD
CROSS APPLY BD.RXML.nodes('/*') AS BDXML(DATA)
WHERE BD.RRID = 1
FOR XML PATH(''),TYPE
).value('(./text())[1]','VARCHAR(MAX)'),1,1,'')
+ N'FROM BASE_DATA BD
' +
STUFF(
(SELECT
N' CROSS APPLY BD.RXML.nodes(''' + BDXML.DATA.value('local-name(.)','varchar(100)') + N''') AS ' + UPPER(BDXML.DATA.value('local-name(.)','varchar(100)')) + N'(DATA)
'
FROM BASE_DATA BD
CROSS APPLY BD.RXML.nodes('/*') AS BDXML(DATA)
WHERE BD.RRID = 1
FOR XML PATH(''),TYPE
).value('(./text())[1]','VARCHAR(MAX)'),1,1,'') +
N'
WHERE BD.RRID > 1
OPTION (MAXDOP 1);
'
EXEC SP_EXECUTESQL @XQUERY_STR,@PARAM_STR,@TXML1;
Output
EMP_ID EMP_NAME EMP_COUNTRY
------- --------- ------------
1001 AAA USA
1002 BBB CANADA
This does not match the expected output and you don't need to use dynamic SQL to get the expected output. I've only given the beginnings of the output, because the source of the column_id hasn't been specified. (I can make an educated guess, but I'll leave it up to the OP to try figure it out on their own.)
DECLARE @t TABLE(tab_id INT, xml_data XML)
INSERT @t(tab_id, xml_data)
SELECT v.tab_id, v.xml_data
FROM (
VALUES(
1, CAST('<row>
<col0> EMP_ID </col0> --COLUMN NAMES FOR DATA
<col1> EMP_NAME </col1>--COLUMN NAMES FOR DATA
<col2> EMP_COUNTRY </col2>--COLUMN NAMES FOR DATA
</row>
<row>
<col0> 1001 </col0>
<col1> AAA </col1>
<col2> USA </col2>
</row>
<row>
<col0> 1002 </col0>
<col1> BBB </col1>
<col2> CANADA </col2>
</row>
' AS XML)),
(2, '<row>
<col1> HR_ID </col1>--COLUMN NAMES FOR DATA
<col2> HR_NAME </col2>--COLUMN NAMES FOR DATA
<col3> HR_ADDRESS </col3>--COLUMN NAMES FOR DATA
<col4> HR_CITY </col4>--COLUMN NAMES FOR DATA
</row>
<row>
<col1> 100 </col1>
<col2> GATES </col2>
<col3> MS </col3>
<col4> WASHINGTON </col4>
</row>
<row>
<col1> 101 </col1>
<col2> MALLA </col2>
<col3> KF </col3>
<col4> NEWYORK </col4>
</row>')
) v(tab_id, xml_data)
SELECT v.*, t.c.value('text()[1]', 'NVARCHAR(15)'), ROW_NUMBER() OVER(PARTITION BY v.tab_id ORDER BY t.c)
FROM @t AS v
CROSS APPLY v.xml_data.nodes('row[1]/*') AS t(c)
SELECT v.*, c.c.value('text()[1]', 'NVARCHAR(15)'),
DENSE_RANK() OVER(PARTITION BY v.tab_id ORDER BY r.c),
ROW_NUMBER() OVER(PARTITION BY v.tab_id, r.c ORDER BY c.c)
FROM @t AS v
CROSS APPLY v.xml_data.nodes('row[position()>1]') AS r(c)
CROSS APPLY r.c.nodes('./*') AS c(c)
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
April 4, 2016 at 12:59 pm
Eirikur Eiriksson (4/4/2016)
Edit: Just tested 10.000 row elements on my old laptop, finished in less than a second.
If all the rows were identical, then stats will only have one row in it and XML is smart enough to know that and only process the row once as a "run time constant". Such "grooved" data is the same reason why many people think that an XML splitter will beat any and all Tally Table style splitters. I'm currently writing an article on this all too common performance testing problem.
If the rows had the equivalent of a "unique" cardinality for each row in the 10,000, then ok.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 4, 2016 at 1:08 pm
Jeff Moden (4/4/2016)
Eirikur Eiriksson (4/4/2016)
Edit: Just tested 10.000 row elements on my old laptop, finished in less than a second.If all the rows were identical, then stats will only have one row in it and XML is smart enough to know that and only process the row once as a "run time constant". Such "grooved" data is the same reason why many people think that an XML splitter will beat any and all Tally Table style splitters. I'm currently writing an article on this all too common performance testing problem.
If the rows had the equivalent of a "unique" cardinality for each row in the 10,000, then ok.
10,000 rows in a second is not particularly good performance but not bad either if one is dynamically constructing a table from text/xml. My point is that there are limits to such solutions.
π
On the XML splitter, have you tested combination of the two (8K and XML) by diverting the each based on the length?
April 4, 2016 at 1:16 pm
drew.allen (4/4/2016)
Eirikur Eiriksson (4/4/2016)
Here is a quick suggestion towards a solution, let us know if you need more assistance.π
USE tempdb;
GO
SET NOCOUNT ON;
DECLARE @XQUERY_STR NVARCHAR(MAX) = N'';
DECLARE @PARAM_STR NVARCHAR(MAX) = N'@TXML1 XML';
DECLARE @TXML1 XML = N'<row>
<col0> EMP_ID </col0> --COLUMN NAMES FOR DATA
<col1> EMP_NAME </col1>--COLUMN NAMES FOR DATA
<col2> EMP_COUNTRY </col2>--COLUMN NAMES FOR DATA
</row>
<row>
<col0> 1001 </col0>
<col1> AAA </col1>
<col2> USA </col2>
</row>
<row>
<col0> 1002 </col0>
<col1> BBB </col1>
<col2> CANADA </col2>
</row>';
-- FIRST ROW CONTAINS THE COLUMN NAMES, LIMITING BASE_DATA TO RID = 1
;WITH BASE_DATA AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RRID
,XROW.DATA.query('*') AS RXML
FROM @TXML1.nodes('//row') AS XROW(DATA)
)
SELECT @XQUERY_STR = N';WITH BASE_DATA AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RRID
,XROW.DATA.query(''*'') AS RXML
FROM @TXML1.nodes(''//row'') AS XROW(DATA)
)
SELECT
' +
STUFF(
(SELECT
N',' + UPPER(BDXML.DATA.value('local-name(.)','varchar(100)')) + N'.DATA.value(''(text())[1]'',''varchar(100)'') AS '
+ BDXML.DATA.value('(text())[1]','varchar(100)') + N'
'
FROM BASE_DATA BD
CROSS APPLY BD.RXML.nodes('/*') AS BDXML(DATA)
WHERE BD.RRID = 1
FOR XML PATH(''),TYPE
).value('(./text())[1]','VARCHAR(MAX)'),1,1,'')
+ N'FROM BASE_DATA BD
' +
STUFF(
(SELECT
N' CROSS APPLY BD.RXML.nodes(''' + BDXML.DATA.value('local-name(.)','varchar(100)') + N''') AS ' + UPPER(BDXML.DATA.value('local-name(.)','varchar(100)')) + N'(DATA)
'
FROM BASE_DATA BD
CROSS APPLY BD.RXML.nodes('/*') AS BDXML(DATA)
WHERE BD.RRID = 1
FOR XML PATH(''),TYPE
).value('(./text())[1]','VARCHAR(MAX)'),1,1,'') +
N'
WHERE BD.RRID > 1
OPTION (MAXDOP 1);
'
EXEC SP_EXECUTESQL @XQUERY_STR,@PARAM_STR,@TXML1;
Output
EMP_ID EMP_NAME EMP_COUNTRY
------- --------- ------------
1001 AAA USA
1002 BBB CANADA
This does not match the expected output and you don't need to use dynamic SQL to get the expected output. I've only given the beginnings of the output, because the source of the column_id hasn't been specified. (I can make an educated guess, but I'll leave it up to the OP to try figure it out on their own.)
DECLARE @t TABLE(tab_id INT, xml_data XML)
INSERT @t(tab_id, xml_data)
SELECT v.tab_id, v.xml_data
FROM (
VALUES(
1, CAST('<row>
<col0> EMP_ID </col0> --COLUMN NAMES FOR DATA
<col1> EMP_NAME </col1>--COLUMN NAMES FOR DATA
<col2> EMP_COUNTRY </col2>--COLUMN NAMES FOR DATA
</row>
<row>
<col0> 1001 </col0>
<col1> AAA </col1>
<col2> USA </col2>
</row>
<row>
<col0> 1002 </col0>
<col1> BBB </col1>
<col2> CANADA </col2>
</row>
' AS XML)),
(2, '<row>
<col1> HR_ID </col1>--COLUMN NAMES FOR DATA
<col2> HR_NAME </col2>--COLUMN NAMES FOR DATA
<col3> HR_ADDRESS </col3>--COLUMN NAMES FOR DATA
<col4> HR_CITY </col4>--COLUMN NAMES FOR DATA
</row>
<row>
<col1> 100 </col1>
<col2> GATES </col2>
<col3> MS </col3>
<col4> WASHINGTON </col4>
</row>
<row>
<col1> 101 </col1>
<col2> MALLA </col2>
<col3> KF </col3>
<col4> NEWYORK </col4>
</row>')
) v(tab_id, xml_data)
SELECT v.*, t.c.value('text()[1]', 'NVARCHAR(15)'), ROW_NUMBER() OVER(PARTITION BY v.tab_id ORDER BY t.c)
FROM @t AS v
CROSS APPLY v.xml_data.nodes('row[1]/*') AS t(c)
SELECT v.*, c.c.value('text()[1]', 'NVARCHAR(15)'),
DENSE_RANK() OVER(PARTITION BY v.tab_id ORDER BY r.c),
ROW_NUMBER() OVER(PARTITION BY v.tab_id, r.c ORDER BY c.c)
FROM @t AS v
CROSS APPLY v.xml_data.nodes('row[position()>1]') AS r(c)
CROSS APPLY r.c.nodes('./*') AS c(c)
Drew
As I posted earlier, "suggestion towards a solution", in fact the desired output is the output of the first CTE without the filter
π
Expect the OP to read and ask further questions if needed.
April 4, 2016 at 2:01 pm
Eirikur Eiriksson (4/4/2016)
Jeff Moden (4/4/2016)
Eirikur Eiriksson (4/4/2016)
Edit: Just tested 10.000 row elements on my old laptop, finished in less than a second.If all the rows were identical, then stats will only have one row in it and XML is smart enough to know that and only process the row once as a "run time constant". Such "grooved" data is the same reason why many people think that an XML splitter will beat any and all Tally Table style splitters. I'm currently writing an article on this all too common performance testing problem.
If the rows had the equivalent of a "unique" cardinality for each row in the 10,000, then ok.
10,000 rows in a second is not particularly good performance but not bad either if one is dynamically constructing a table from text/xml. My point is that there are limits to such solutions.
π
On the XML splitter, have you tested combination of the two (8K and XML) by diverting the each based on the length?
What I worry about is when 10,000 rows appear to have a 1 second performance and then you put it in production and BOOM! The example I use in my presentation works 100K rows of "grooved" data in just 3.3 seconds. With "real" data with some good cardinality according to stats, the same code takes over 8 minutes for the same length of data and same number of elements.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 4, 2016 at 2:44 pm
Jeff Moden (4/4/2016)
Eirikur Eiriksson (4/4/2016)
Jeff Moden (4/4/2016)
Eirikur Eiriksson (4/4/2016)
Edit: Just tested 10.000 row elements on my old laptop, finished in less than a second.If all the rows were identical, then stats will only have one row in it and XML is smart enough to know that and only process the row once as a "run time constant". Such "grooved" data is the same reason why many people think that an XML splitter will beat any and all Tally Table style splitters. I'm currently writing an article on this all too common performance testing problem.
If the rows had the equivalent of a "unique" cardinality for each row in the 10,000, then ok.
10,000 rows in a second is not particularly good performance but not bad either if one is dynamically constructing a table from text/xml. My point is that there are limits to such solutions.
π
On the XML splitter, have you tested combination of the two (8K and XML) by diverting the each based on the length?
What I worry about is when 10,000 rows appear to have a 1 second performance and then you put it in production and BOOM! The example I use in my presentation works 100K rows of "grooved" data in just 3.3 seconds. With "real" data with some good cardinality according to stats, the same code takes over 8 minutes for the same length of data and same number of elements.
I second the concern here and looking forward to read your article.
π
April 4, 2016 at 3:09 pm
Eirikur Eiriksson (4/4/2016)
As I posted earlier, "suggestion towards a solution", in fact the desired output is the output of the first CTE without the filter
No, the OP is asking for data in an EAV format, and at no point do you return data in an EAV format. Just because you would never want to return the data in an EAV format doesn't mean that the OP doesn't have a legitimate reason for returning the data in that format.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
April 4, 2016 at 3:24 pm
drew.allen (4/4/2016)
Eirikur Eiriksson (4/4/2016)
As I posted earlier, "suggestion towards a solution", in fact the desired output is the output of the first CTE without the filterNo, the OP is asking for data in an EAV format, and at no point do you return data in an EAV format. Just because you would never want to return the data in an EAV format doesn't mean that the OP doesn't have a legitimate reason for returning the data in that format.
Drew
Not certain what you are going on about here, the first CTE does return the row elements as xml, clear example of how to EAV those in the code
π
Not trying to feed anyone but trying to teach them how to catch a fish.
April 5, 2016 at 8:28 am
Eirikur Eiriksson (4/4/2016)
drew.allen (4/4/2016)
Eirikur Eiriksson (4/4/2016)
As I posted earlier, "suggestion towards a solution", in fact the desired output is the output of the first CTE without the filterNo, the OP is asking for data in an EAV format, and at no point do you return data in an EAV format. Just because you would never want to return the data in an EAV format doesn't mean that the OP doesn't have a legitimate reason for returning the data in that format.
Drew
Not certain what you are going on about here, the first CTE does return the row elements as xml, clear example of how to EAV those in the code
π
Not trying to feed anyone but trying to teach them how to catch a fish.
Let me put it another way. You are returning data in a pivoted format when the OP asked for data in an unpivoted format.
Each row contains xml for different tables with differing numbers of columns and differing column names. With the unpivoted approach, it's easy to combine the different tables in the same result set, but it's much more difficult to do that with the pivoted approach.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply