simple xml to table

  • I got some xml that is essentially an html table that I need to turn into a standard table resultset from a stored proc. If you take this xml and save it as html that is the desired resultset I am looking for. I realize the <td> tags repeat so I would just prefer 'col' + positional index for the col name. Keep in mind that <td> could be is 1 to n. In the end if anyone can point me to some extensive xml Data Type Methods documentation that would be great.

    <table>

    <tr>

    <td>cell1</td>

    <td>cell2</td>

    <td>cell3</td>

    </tr>

    <tr>

    <td>cell4</td>

    <td>cell5</td>

    <td>cell6</td>

    </tr>

    <tr>

    <td>cell7</td>

    <td>cell8</td>

    <td>cell9</td>

    </tr>

    </table>

    This is my attempt but I can't figure out how to get separate cols

    declare @GridData xml = '<table><tr><td>cell1</td><td>cell2</td><td>cell3</td></tr><tr><td>cell4</td><td>cell5</td><td>cell6</td></tr><tr><td>cell7</td><td>cell8</td><td>cell8</td></tr></table>'

    select T.C.value('.', 'nvarchar(max)')

    from @GridData.nodes('//tr') T(C)

  • A quick solution, should get you passed this hurdle

    😎

    USE tempdb;

    GO

    SET NOCOUNT ON;

    DECLARE @TXML XML = '<table>

    <tr>

    <td>cell1</td>

    <td>cell2</td>

    <td>cell3</td>

    </tr>

    <tr>

    <td>cell4</td>

    <td>cell5</td>

    <td>cell6</td>

    </tr>

    <tr>

    <td>cell7</td>

    <td>cell8</td>

    <td>cell9</td>

    </tr>

    </table>';

    ;WITH BASE_DATA AS

    (

    SELECT

    ROW_NUMBER() OVER

    (

    PARTITION BY (SELECT NULL)

    ORDER BY (SELECT NULL)

    ) AS ROW_ID

    ,TAB.DATA.query('*') AS RDATA

    FROM @TXML.nodes('table/tr') AS TAB(DATA)

    )

    ,COLUMN_DATA AS

    (

    SELECT

    BD.ROW_ID

    ,ROW_NUMBER() OVER

    (

    PARTITION BY BD.ROW_ID

    ORDER BY (SELECT NULL)

    ) AS COL_ID

    ,COL.DATA.value('.[1]','VARCHAR(50)') AS COL_VAL

    FROM BASE_DATA BD

    OUTER APPLY BD.RDATA.nodes('td') AS COL(DATA)

    )

    SELECT

    CD.ROW_ID

    ,MAX(CASE WHEN CD.COL_ID = 1 THEN CD.COL_VAL END) AS COL_01

    ,MAX(CASE WHEN CD.COL_ID = 2 THEN CD.COL_VAL END) AS COL_02

    ,MAX(CASE WHEN CD.COL_ID = 3 THEN CD.COL_VAL END) AS COL_03

    ,MAX(CASE WHEN CD.COL_ID = 4 THEN CD.COL_VAL END) AS COL_04

    ,MAX(CASE WHEN CD.COL_ID = 5 THEN CD.COL_VAL END) AS COL_05

    ,MAX(CASE WHEN CD.COL_ID = 6 THEN CD.COL_VAL END) AS COL_06

    ,MAX(CASE WHEN CD.COL_ID = 7 THEN CD.COL_VAL END) AS COL_07

    ,MAX(CASE WHEN CD.COL_ID = 8 THEN CD.COL_VAL END) AS COL_08

    ,MAX(CASE WHEN CD.COL_ID = 9 THEN CD.COL_VAL END) AS COL_09

    ,MAX(CASE WHEN CD.COL_ID = 10 THEN CD.COL_VAL END) AS COL_10

    FROM COLUMN_DATA CD

    GROUP BY CD.ROW_ID;

    Results

    ROW_ID COL_01 COL_02 COL_03 COL_04 COL_05 ...

    ------- ------- ------- ------- ------- -------...

    1 cell1 cell2 cell3 NULL NULL ...

    2 cell4 cell5 cell6 NULL NULL ...

    3 cell7 cell8 cell9 NULL NULL ...

  • That's f***ing impressive, I have been working with various langs for more than 15 years and usually you can gauge how complex something is and how much code it might take to accomplish something. I was not expecting to see the code this complex for especially when you think of how easy it is what want I to do just by using in a foreach in xsl. Damn m$. Now I just to need figure exactly how this code is working, thanks for your help!

  • 🙂

    Alternatively use OPENXML

    DECLARE @xmlDocument nvarchar(max)

    SET @xmlDocument = N'<table>

    <tr>

    <td>cell1</td>

    <td>cell2</td>

    <td>cell3</td>

    </tr>

    <tr>

    <td>cell4</td>

    <td>cell5</td>

    <td>cell6</td>

    </tr>

    <tr>

    <td>cell7</td>

    <td>cell8</td>

    <td>cell9</td>

    </tr>

    </table>';

    DECLARE @docHandle int;

    EXEC sp_xml_preparedocument @docHandle OUTPUT, @xmlDocument;

    WITH rs1 AS (

    SELECT cast(a.text as varchar(100)) [text]

    , row_number() over (partition by b.parentid order by a.parentid) AS colID

    , b.parentid AS rowid

    FROM OPENXML(@docHandle, N'/table/tr') a

    JOIN OPENXML(@docHandle, N'/table/tr') b

    ON a.Localname='#text' and a.parentid = b.id

    )

    SELECT rowid, [1],[2],[3]

    FROM rs1

    PIVOT (max([text]) for colID in ([1],[2],[3])) as pv

  • serg-52 (10/30/2014)


    🙂

    Alternatively use OPENXML

    DECLARE @xmlDocument nvarchar(max)

    SET @xmlDocument = N'<table>

    <tr>

    <td>cell1</td>

    <td>cell2</td>

    <td>cell3</td>

    </tr>

    <tr>

    <td>cell4</td>

    <td>cell5</td>

    <td>cell6</td>

    </tr>

    <tr>

    <td>cell7</td>

    <td>cell8</td>

    <td>cell9</td>

    </tr>

    </table>';

    DECLARE @docHandle int;

    EXEC sp_xml_preparedocument @docHandle OUTPUT, @xmlDocument;

    WITH rs1 AS (

    SELECT cast(a.text as varchar(100)) [text]

    , row_number() over (partition by b.parentid order by a.parentid) AS colID

    , b.parentid AS rowid

    FROM OPENXML(@docHandle, N'/table/tr') a

    JOIN OPENXML(@docHandle, N'/table/tr') b

    ON a.Localname='#text' and a.parentid = b.id

    )

    SELECT rowid, [1],[2],[3]

    FROM rs1

    PIVOT (max([text]) for colID in ([1],[2],[3])) as pv

    Quick thought, I'm hesitant to recommend this approach as it performs worse, has some caveat and can possibly result in bogging the memory, i.e. I cannot see any sp_xml_removedocument statement in this code.

    😎

  • Took your initial example and modified it so I could use a pivot with a dynamic set of cols, not sure how much of a hit I'm taking by using temp tables instead prob not significant enough to be a problem for me.

    declare @txml xml = '<table>

    <tr>

    <td>cell1</td>

    <td>cell2</td>

    <td>cell3</td>

    </tr>

    <tr>

    <td>cell4</td>

    <td>cell5</td>

    <td>cell6</td>

    </tr>

    <tr>

    <td>cell7</td>

    <td>cell8</td>

    <td>cell9</td>

    </tr>

    </table>'

    select row_number() over

    (

    partition by (select null)

    order by (select null)

    ) as row_id

    ,tab.data.query('*') as rdata into #base_data

    from @txml.nodes('table/tr') as tab(data)

    select bd.row_id,row_number() over

    (

    partition by bd.row_id -- causes the row_number to reset for each row

    order by (select null)

    ) as col_id

    ,col.data.value('.[1]','nvarchar(max)') as col_val -- selecting only 1 <td> node at a time

    into #column_data

    from base_data bd

    outer apply bd.rdata.nodes('td') as col(data)

    declare @cols as nvarchar(max), @prettycols as nvarchar(max), @query as nvarchar(max)

    select @cols =

    stuff((select ',' + quotename(col_id)

    from #column_data cd

    group by col_id

    order by col_id

    for xml path(''), type).value('.', 'nvarchar(max)'),1,1,'')

    select @prettycols =

    stuff((select ',' + quotename(col_id) + 'as Col' + cast(col_id as nvarchar)

    from #column_data cd

    group by col_id

    order by col_id

    for xml path(''), type).value('.', 'nvarchar(max)'),1,1,'')

    set @query =

    'select ' + @prettycols + ' from

    (

    select row_id, col_id, col_val

    from #column_data

    ) x

    pivot

    (

    max(col_val) for col_id in (' + @cols + ')

    ) p

    '

    execute(@query)

    drop table #base_data

    drop table #column_data

  • the OP said that the number of <TD>s could be 1..n.

    A solution that limits the number of columns could result in sparsely populated columns and/or some of the data elements not getting processed.

    Why not store the data in XML format and parse it on the way out of the database. If (as an exemplar) the data was polygon coordinates for geographic boundaries, then the only appropriate way to handle this would be to iterate through a collection in the downstream system. This may be a .NET IEnumeratable collection or a JSON object, or even the original XML.

    If you do need to parse it on the way in to the database, then I would seriously suggest holding each data element in its own row with enough fields to identify its unique position in the original file

    i.e.

    ImcomingFileID, RecordID (this would be the <TR> sequence count, ColumnName (this would be the <TD> sequence count within the <TR>) and record value (the contents of the <TD>)

    This does smack a bit of an EAV table, but it is the only way to ensure that all data is captured.

    You can then convert this to the required format with n columns using the PIVOT() function (and some horrid dynamic SQL)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply