Handling xml data and formatting into separate table

  • I have xml data in table, here xml data looks mentioned below,

    And it’s taken from different table for each row

    TableA

    Tab_idxml_data

    1xml_data1 -- eg: Table 1

    2xml_data2 -- eg: Table 2

    xml_data1

    <row>

    <col0> EMP_ID </col0> --COLUMN NAMES FOR DATA

    <col1> EMP_NAME </col1>--COLUMN NAMES FOR DATA

    <col2> EMP_COUNTRY </col2>--COLUMN NAMES FOR DATA

    </row>

    <row>

    <col0> 1001 </col0>

    <col1> AAA </col1>

    <col2> USA </col2>

    </row>

    <row>

    <col0> 1002 </col0>

    <col1> BBB </col1>

    <col2> CANADA </col2>

    </row>

    xml_data2

    <row>

    <col1> HR_ID </col1>--COLUMN NAMES FOR DATA

    <col2> HR_NAME </col2>--COLUMN NAMES FOR DATA

    <col3> HR_ADDRESS </col3>--COLUMN NAMES FOR DATA

    <col4> HR_CITY </col4>--COLUMN NAMES FOR DATA

    </row>

    <row>

    <col1> 100 </col1>

    <col2> GATES </col2>

    <col3> MS </col3>

    <col4> WASHINGTON </col4>

    </row>

    <row>

    <col1> 101 </col1>

    <col2> MALLA </col2>

    <col3> KF </col3>

    <col4> NEWYORK </col4>

    </row>

    Actual table View of xml_data1 will be

    EMP_IDEMP_NAMEEMP_COUNTRY

    1001AAAUSA

    1002BBBCANADA

    Actual table View of xml_data2 will be

    HR_IDHR_NAMEHR_ADDRESSHR_CITY

    100GATESMSWASHINGTON

    101MALLAKFNEWYORK

    Here I need an output like

    Table name: TableA_Columns

    Tab_IdColumn_IdColumn_NamePosition

    11111EMP_ID1

    11112EMP_NAME2

    11113EMP_COUNTRY3

    22221HR_ID1

    22222HR_NAME2

    22223HR_ADDRESS3

    22224HR_CITY4

    Table name: TableA_Col_Values

    Tab_IdColumn_IdValue_IdValueRow_No

    11111100000110011

    111121000002AAA1

    111131000003USA1

    11111100000410022

    111121000005BBB2

    111131000006CANADA2

    2222110000071001

    222221000008GATES1

    222231000009MS1

    222241000010WASHINGTON1

    2222110000111011

    222221000012MALLA1

    222231000013KF1

    222241000014NEWYORK1

    Please comment if u need any clarification.

    Thanks a lot in Advance.

  • Here is a quick suggestion towards a solution, let us know if you need more assistance.

    😎

    USE tempdb;

    GO

    SET NOCOUNT ON;

    DECLARE @XQUERY_STR NVARCHAR(MAX) = N'';

    DECLARE @PARAM_STR NVARCHAR(MAX) = N'@TXML1 XML';

    DECLARE @TXML1 XML = N'<row>

    <col0> EMP_ID </col0> --COLUMN NAMES FOR DATA

    <col1> EMP_NAME </col1>--COLUMN NAMES FOR DATA

    <col2> EMP_COUNTRY </col2>--COLUMN NAMES FOR DATA

    </row>

    <row>

    <col0> 1001 </col0>

    <col1> AAA </col1>

    <col2> USA </col2>

    </row>

    <row>

    <col0> 1002 </col0>

    <col1> BBB </col1>

    <col2> CANADA </col2>

    </row>';

    -- FIRST ROW CONTAINS THE COLUMN NAMES, LIMITING BASE_DATA TO RID = 1

    ;WITH BASE_DATA AS

    (

    SELECT

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RRID

    ,XROW.DATA.query('*') AS RXML

    FROM @TXML1.nodes('//row') AS XROW(DATA)

    )

    SELECT @XQUERY_STR = N';WITH BASE_DATA AS

    (

    SELECT

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RRID

    ,XROW.DATA.query(''*'') AS RXML

    FROM @TXML1.nodes(''//row'') AS XROW(DATA)

    )

    SELECT

    ' +

    STUFF(

    (SELECT

    N',' + UPPER(BDXML.DATA.value('local-name(.)','varchar(100)')) + N'.DATA.value(''(text())[1]'',''varchar(100)'') AS '

    + BDXML.DATA.value('(text())[1]','varchar(100)') + N'

    '

    FROM BASE_DATA BD

    CROSS APPLY BD.RXML.nodes('/*') AS BDXML(DATA)

    WHERE BD.RRID = 1

    FOR XML PATH(''),TYPE

    ).value('(./text())[1]','VARCHAR(MAX)'),1,1,'')

    + N'FROM BASE_DATA BD

    ' +

    STUFF(

    (SELECT

    N' CROSS APPLY BD.RXML.nodes(''' + BDXML.DATA.value('local-name(.)','varchar(100)') + N''') AS ' + UPPER(BDXML.DATA.value('local-name(.)','varchar(100)')) + N'(DATA)

    '

    FROM BASE_DATA BD

    CROSS APPLY BD.RXML.nodes('/*') AS BDXML(DATA)

    WHERE BD.RRID = 1

    FOR XML PATH(''),TYPE

    ).value('(./text())[1]','VARCHAR(MAX)'),1,1,'') +

    N'

    WHERE BD.RRID > 1

    OPTION (MAXDOP 1);

    '

    EXEC SP_EXECUTESQL @XQUERY_STR,@PARAM_STR,@TXML1;

    Output

    EMP_ID EMP_NAME EMP_COUNTRY

    ------- --------- ------------

    1001 AAA USA

    1002 BBB CANADA

  • First and by FAR the most important let me say that this should be done OUTSIDE of SQL Server (unless you are going to be using the set in another query on the same box immediately).

    Having said that, a quick Binoogle search for sql server xml data to table will find some very useful and appropriate links.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (4/4/2016)


    First and by FAR the most important let me say that this should be done OUTSIDE of SQL Server (unless you are going to be using the set in another query on the same box immediately).

    Having said that, a quick Binoogle search for sql server xml data to table will find some very useful and appropriate links.

    Not too hard to handle in SQL Server and performance is OK if the sets are small, wouldn't recommend handling large / huge sets in SQL server though.

    😎

    Edit: Just tested 10.000 row elements on my old laptop, finished in less than a second.

  • Eirikur Eiriksson (4/4/2016)


    Here is a quick suggestion towards a solution, let us know if you need more assistance.

    😎

    USE tempdb;

    GO

    SET NOCOUNT ON;

    DECLARE @XQUERY_STR NVARCHAR(MAX) = N'';

    DECLARE @PARAM_STR NVARCHAR(MAX) = N'@TXML1 XML';

    DECLARE @TXML1 XML = N'<row>

    <col0> EMP_ID </col0> --COLUMN NAMES FOR DATA

    <col1> EMP_NAME </col1>--COLUMN NAMES FOR DATA

    <col2> EMP_COUNTRY </col2>--COLUMN NAMES FOR DATA

    </row>

    <row>

    <col0> 1001 </col0>

    <col1> AAA </col1>

    <col2> USA </col2>

    </row>

    <row>

    <col0> 1002 </col0>

    <col1> BBB </col1>

    <col2> CANADA </col2>

    </row>';

    -- FIRST ROW CONTAINS THE COLUMN NAMES, LIMITING BASE_DATA TO RID = 1

    ;WITH BASE_DATA AS

    (

    SELECT

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RRID

    ,XROW.DATA.query('*') AS RXML

    FROM @TXML1.nodes('//row') AS XROW(DATA)

    )

    SELECT @XQUERY_STR = N';WITH BASE_DATA AS

    (

    SELECT

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RRID

    ,XROW.DATA.query(''*'') AS RXML

    FROM @TXML1.nodes(''//row'') AS XROW(DATA)

    )

    SELECT

    ' +

    STUFF(

    (SELECT

    N',' + UPPER(BDXML.DATA.value('local-name(.)','varchar(100)')) + N'.DATA.value(''(text())[1]'',''varchar(100)'') AS '

    + BDXML.DATA.value('(text())[1]','varchar(100)') + N'

    '

    FROM BASE_DATA BD

    CROSS APPLY BD.RXML.nodes('/*') AS BDXML(DATA)

    WHERE BD.RRID = 1

    FOR XML PATH(''),TYPE

    ).value('(./text())[1]','VARCHAR(MAX)'),1,1,'')

    + N'FROM BASE_DATA BD

    ' +

    STUFF(

    (SELECT

    N' CROSS APPLY BD.RXML.nodes(''' + BDXML.DATA.value('local-name(.)','varchar(100)') + N''') AS ' + UPPER(BDXML.DATA.value('local-name(.)','varchar(100)')) + N'(DATA)

    '

    FROM BASE_DATA BD

    CROSS APPLY BD.RXML.nodes('/*') AS BDXML(DATA)

    WHERE BD.RRID = 1

    FOR XML PATH(''),TYPE

    ).value('(./text())[1]','VARCHAR(MAX)'),1,1,'') +

    N'

    WHERE BD.RRID > 1

    OPTION (MAXDOP 1);

    '

    EXEC SP_EXECUTESQL @XQUERY_STR,@PARAM_STR,@TXML1;

    Output

    EMP_ID EMP_NAME EMP_COUNTRY

    ------- --------- ------------

    1001 AAA USA

    1002 BBB CANADA

    This does not match the expected output and you don't need to use dynamic SQL to get the expected output. I've only given the beginnings of the output, because the source of the column_id hasn't been specified. (I can make an educated guess, but I'll leave it up to the OP to try figure it out on their own.)

    DECLARE @t TABLE(tab_id INT, xml_data XML)

    INSERT @t(tab_id, xml_data)

    SELECT v.tab_id, v.xml_data

    FROM (

    VALUES(

    1, CAST('<row>

    <col0> EMP_ID </col0> --COLUMN NAMES FOR DATA

    <col1> EMP_NAME </col1>--COLUMN NAMES FOR DATA

    <col2> EMP_COUNTRY </col2>--COLUMN NAMES FOR DATA

    </row>

    <row>

    <col0> 1001 </col0>

    <col1> AAA </col1>

    <col2> USA </col2>

    </row>

    <row>

    <col0> 1002 </col0>

    <col1> BBB </col1>

    <col2> CANADA </col2>

    </row>

    ' AS XML)),

    (2, '<row>

    <col1> HR_ID </col1>--COLUMN NAMES FOR DATA

    <col2> HR_NAME </col2>--COLUMN NAMES FOR DATA

    <col3> HR_ADDRESS </col3>--COLUMN NAMES FOR DATA

    <col4> HR_CITY </col4>--COLUMN NAMES FOR DATA

    </row>

    <row>

    <col1> 100 </col1>

    <col2> GATES </col2>

    <col3> MS </col3>

    <col4> WASHINGTON </col4>

    </row>

    <row>

    <col1> 101 </col1>

    <col2> MALLA </col2>

    <col3> KF </col3>

    <col4> NEWYORK </col4>

    </row>')

    ) v(tab_id, xml_data)

    SELECT v.*, t.c.value('text()[1]', 'NVARCHAR(15)'), ROW_NUMBER() OVER(PARTITION BY v.tab_id ORDER BY t.c)

    FROM @t AS v

    CROSS APPLY v.xml_data.nodes('row[1]/*') AS t(c)

    SELECT v.*, c.c.value('text()[1]', 'NVARCHAR(15)'),

    DENSE_RANK() OVER(PARTITION BY v.tab_id ORDER BY r.c),

    ROW_NUMBER() OVER(PARTITION BY v.tab_id, r.c ORDER BY c.c)

    FROM @t AS v

    CROSS APPLY v.xml_data.nodes('row[position()>1]') AS r(c)

    CROSS APPLY r.c.nodes('./*') AS c(c)

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Eirikur Eiriksson (4/4/2016)


    Edit: Just tested 10.000 row elements on my old laptop, finished in less than a second.

    If all the rows were identical, then stats will only have one row in it and XML is smart enough to know that and only process the row once as a "run time constant". Such "grooved" data is the same reason why many people think that an XML splitter will beat any and all Tally Table style splitters. I'm currently writing an article on this all too common performance testing problem.

    If the rows had the equivalent of a "unique" cardinality for each row in the 10,000, then ok.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/4/2016)


    Eirikur Eiriksson (4/4/2016)


    Edit: Just tested 10.000 row elements on my old laptop, finished in less than a second.

    If all the rows were identical, then stats will only have one row in it and XML is smart enough to know that and only process the row once as a "run time constant". Such "grooved" data is the same reason why many people think that an XML splitter will beat any and all Tally Table style splitters. I'm currently writing an article on this all too common performance testing problem.

    If the rows had the equivalent of a "unique" cardinality for each row in the 10,000, then ok.

    10,000 rows in a second is not particularly good performance but not bad either if one is dynamically constructing a table from text/xml. My point is that there are limits to such solutions.

    😎

    On the XML splitter, have you tested combination of the two (8K and XML) by diverting the each based on the length?

  • drew.allen (4/4/2016)


    Eirikur Eiriksson (4/4/2016)


    Here is a quick suggestion towards a solution, let us know if you need more assistance.

    😎

    USE tempdb;

    GO

    SET NOCOUNT ON;

    DECLARE @XQUERY_STR NVARCHAR(MAX) = N'';

    DECLARE @PARAM_STR NVARCHAR(MAX) = N'@TXML1 XML';

    DECLARE @TXML1 XML = N'<row>

    <col0> EMP_ID </col0> --COLUMN NAMES FOR DATA

    <col1> EMP_NAME </col1>--COLUMN NAMES FOR DATA

    <col2> EMP_COUNTRY </col2>--COLUMN NAMES FOR DATA

    </row>

    <row>

    <col0> 1001 </col0>

    <col1> AAA </col1>

    <col2> USA </col2>

    </row>

    <row>

    <col0> 1002 </col0>

    <col1> BBB </col1>

    <col2> CANADA </col2>

    </row>';

    -- FIRST ROW CONTAINS THE COLUMN NAMES, LIMITING BASE_DATA TO RID = 1

    ;WITH BASE_DATA AS

    (

    SELECT

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RRID

    ,XROW.DATA.query('*') AS RXML

    FROM @TXML1.nodes('//row') AS XROW(DATA)

    )

    SELECT @XQUERY_STR = N';WITH BASE_DATA AS

    (

    SELECT

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RRID

    ,XROW.DATA.query(''*'') AS RXML

    FROM @TXML1.nodes(''//row'') AS XROW(DATA)

    )

    SELECT

    ' +

    STUFF(

    (SELECT

    N',' + UPPER(BDXML.DATA.value('local-name(.)','varchar(100)')) + N'.DATA.value(''(text())[1]'',''varchar(100)'') AS '

    + BDXML.DATA.value('(text())[1]','varchar(100)') + N'

    '

    FROM BASE_DATA BD

    CROSS APPLY BD.RXML.nodes('/*') AS BDXML(DATA)

    WHERE BD.RRID = 1

    FOR XML PATH(''),TYPE

    ).value('(./text())[1]','VARCHAR(MAX)'),1,1,'')

    + N'FROM BASE_DATA BD

    ' +

    STUFF(

    (SELECT

    N' CROSS APPLY BD.RXML.nodes(''' + BDXML.DATA.value('local-name(.)','varchar(100)') + N''') AS ' + UPPER(BDXML.DATA.value('local-name(.)','varchar(100)')) + N'(DATA)

    '

    FROM BASE_DATA BD

    CROSS APPLY BD.RXML.nodes('/*') AS BDXML(DATA)

    WHERE BD.RRID = 1

    FOR XML PATH(''),TYPE

    ).value('(./text())[1]','VARCHAR(MAX)'),1,1,'') +

    N'

    WHERE BD.RRID > 1

    OPTION (MAXDOP 1);

    '

    EXEC SP_EXECUTESQL @XQUERY_STR,@PARAM_STR,@TXML1;

    Output

    EMP_ID EMP_NAME EMP_COUNTRY

    ------- --------- ------------

    1001 AAA USA

    1002 BBB CANADA

    This does not match the expected output and you don't need to use dynamic SQL to get the expected output. I've only given the beginnings of the output, because the source of the column_id hasn't been specified. (I can make an educated guess, but I'll leave it up to the OP to try figure it out on their own.)

    DECLARE @t TABLE(tab_id INT, xml_data XML)

    INSERT @t(tab_id, xml_data)

    SELECT v.tab_id, v.xml_data

    FROM (

    VALUES(

    1, CAST('<row>

    <col0> EMP_ID </col0> --COLUMN NAMES FOR DATA

    <col1> EMP_NAME </col1>--COLUMN NAMES FOR DATA

    <col2> EMP_COUNTRY </col2>--COLUMN NAMES FOR DATA

    </row>

    <row>

    <col0> 1001 </col0>

    <col1> AAA </col1>

    <col2> USA </col2>

    </row>

    <row>

    <col0> 1002 </col0>

    <col1> BBB </col1>

    <col2> CANADA </col2>

    </row>

    ' AS XML)),

    (2, '<row>

    <col1> HR_ID </col1>--COLUMN NAMES FOR DATA

    <col2> HR_NAME </col2>--COLUMN NAMES FOR DATA

    <col3> HR_ADDRESS </col3>--COLUMN NAMES FOR DATA

    <col4> HR_CITY </col4>--COLUMN NAMES FOR DATA

    </row>

    <row>

    <col1> 100 </col1>

    <col2> GATES </col2>

    <col3> MS </col3>

    <col4> WASHINGTON </col4>

    </row>

    <row>

    <col1> 101 </col1>

    <col2> MALLA </col2>

    <col3> KF </col3>

    <col4> NEWYORK </col4>

    </row>')

    ) v(tab_id, xml_data)

    SELECT v.*, t.c.value('text()[1]', 'NVARCHAR(15)'), ROW_NUMBER() OVER(PARTITION BY v.tab_id ORDER BY t.c)

    FROM @t AS v

    CROSS APPLY v.xml_data.nodes('row[1]/*') AS t(c)

    SELECT v.*, c.c.value('text()[1]', 'NVARCHAR(15)'),

    DENSE_RANK() OVER(PARTITION BY v.tab_id ORDER BY r.c),

    ROW_NUMBER() OVER(PARTITION BY v.tab_id, r.c ORDER BY c.c)

    FROM @t AS v

    CROSS APPLY v.xml_data.nodes('row[position()>1]') AS r(c)

    CROSS APPLY r.c.nodes('./*') AS c(c)

    Drew

    As I posted earlier, "suggestion towards a solution", in fact the desired output is the output of the first CTE without the filter

    😎

    Expect the OP to read and ask further questions if needed.

  • Eirikur Eiriksson (4/4/2016)


    Jeff Moden (4/4/2016)


    Eirikur Eiriksson (4/4/2016)


    Edit: Just tested 10.000 row elements on my old laptop, finished in less than a second.

    If all the rows were identical, then stats will only have one row in it and XML is smart enough to know that and only process the row once as a "run time constant". Such "grooved" data is the same reason why many people think that an XML splitter will beat any and all Tally Table style splitters. I'm currently writing an article on this all too common performance testing problem.

    If the rows had the equivalent of a "unique" cardinality for each row in the 10,000, then ok.

    10,000 rows in a second is not particularly good performance but not bad either if one is dynamically constructing a table from text/xml. My point is that there are limits to such solutions.

    😎

    On the XML splitter, have you tested combination of the two (8K and XML) by diverting the each based on the length?

    What I worry about is when 10,000 rows appear to have a 1 second performance and then you put it in production and BOOM! The example I use in my presentation works 100K rows of "grooved" data in just 3.3 seconds. With "real" data with some good cardinality according to stats, the same code takes over 8 minutes for the same length of data and same number of elements.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/4/2016)


    Eirikur Eiriksson (4/4/2016)


    Jeff Moden (4/4/2016)


    Eirikur Eiriksson (4/4/2016)


    Edit: Just tested 10.000 row elements on my old laptop, finished in less than a second.

    If all the rows were identical, then stats will only have one row in it and XML is smart enough to know that and only process the row once as a "run time constant". Such "grooved" data is the same reason why many people think that an XML splitter will beat any and all Tally Table style splitters. I'm currently writing an article on this all too common performance testing problem.

    If the rows had the equivalent of a "unique" cardinality for each row in the 10,000, then ok.

    10,000 rows in a second is not particularly good performance but not bad either if one is dynamically constructing a table from text/xml. My point is that there are limits to such solutions.

    😎

    On the XML splitter, have you tested combination of the two (8K and XML) by diverting the each based on the length?

    What I worry about is when 10,000 rows appear to have a 1 second performance and then you put it in production and BOOM! The example I use in my presentation works 100K rows of "grooved" data in just 3.3 seconds. With "real" data with some good cardinality according to stats, the same code takes over 8 minutes for the same length of data and same number of elements.

    I second the concern here and looking forward to read your article.

    😎

  • Eirikur Eiriksson (4/4/2016)


    As I posted earlier, "suggestion towards a solution", in fact the desired output is the output of the first CTE without the filter

    No, the OP is asking for data in an EAV format, and at no point do you return data in an EAV format. Just because you would never want to return the data in an EAV format doesn't mean that the OP doesn't have a legitimate reason for returning the data in that format.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (4/4/2016)


    Eirikur Eiriksson (4/4/2016)


    As I posted earlier, "suggestion towards a solution", in fact the desired output is the output of the first CTE without the filter

    No, the OP is asking for data in an EAV format, and at no point do you return data in an EAV format. Just because you would never want to return the data in an EAV format doesn't mean that the OP doesn't have a legitimate reason for returning the data in that format.

    Drew

    Not certain what you are going on about here, the first CTE does return the row elements as xml, clear example of how to EAV those in the code

    😎

    Not trying to feed anyone but trying to teach them how to catch a fish.

  • Eirikur Eiriksson (4/4/2016)


    drew.allen (4/4/2016)


    Eirikur Eiriksson (4/4/2016)


    As I posted earlier, "suggestion towards a solution", in fact the desired output is the output of the first CTE without the filter

    No, the OP is asking for data in an EAV format, and at no point do you return data in an EAV format. Just because you would never want to return the data in an EAV format doesn't mean that the OP doesn't have a legitimate reason for returning the data in that format.

    Drew

    Not certain what you are going on about here, the first CTE does return the row elements as xml, clear example of how to EAV those in the code

    😎

    Not trying to feed anyone but trying to teach them how to catch a fish.

    Let me put it another way. You are returning data in a pivoted format when the OP asked for data in an unpivoted format.

    Each row contains xml for different tables with differing numbers of columns and differing column names. With the unpivoted approach, it's easy to combine the different tables in the same result set, but it's much more difficult to do that with the pivoted approach.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply