simple xml to table

Question

Post reply

simple xml to table

wall str33t

SSC Veteran

Points: 253
More actions
October 29, 2014 at 2:16 pm

#298005

I got some xml that is essentially an html table that I need to turn into a standard table resultset from a stored proc. If you take this xml and save it as html that is the desired resultset I am looking for. I realize the <td> tags repeat so I would just prefer 'col' + positional index for the col name. Keep in mind that <td> could be is 1 to n. In the end if anyone can point me to some extensive xml Data Type Methods documentation that would be great.
<table>
<tr>
<td>cell1</td>
<td>cell2</td>
<td>cell3</td>
</tr>
<tr>
<td>cell4</td>
<td>cell5</td>
<td>cell6</td>
</tr>
<tr>
<td>cell7</td>
<td>cell8</td>
<td>cell9</td>
</tr>
</table>
This is my attempt but I can't figure out how to get separate cols
declare @GridData xml = '<table><tr><td>cell1</td><td>cell2</td><td>cell3</td></tr><tr><td>cell4</td><td>cell5</td><td>cell6</td></tr><tr><td>cell7</td><td>cell8</td><td>cell8</td></tr></table>'
select T.C.value('.', 'nvarchar(max)')
from @GridData.nodes('//tr') T(C)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182918 More actions · Answer 1

A quick solution, should get you passed this hurdle

😎

USE tempdb;

GO

SET NOCOUNT ON;

DECLARE @TXML XML = '<table>

<tr>

<td>cell1</td>

<td>cell2</td>

<td>cell3</td>

</tr>

<tr>

<td>cell4</td>

<td>cell5</td>

<td>cell6</td>

</tr>

<tr>

<td>cell7</td>

<td>cell8</td>

<td>cell9</td>

</tr>

</table>';

;WITH BASE_DATA AS

(

SELECT

ROW_NUMBER() OVER

(

PARTITION BY (SELECT NULL)

ORDER BY (SELECT NULL)

) AS ROW_ID

,TAB.DATA.query('*') AS RDATA

FROM @TXML.nodes('table/tr') AS TAB(DATA)

)

,COLUMN_DATA AS

(

SELECT

BD.ROW_ID

,ROW_NUMBER() OVER

(

PARTITION BY BD.ROW_ID

ORDER BY (SELECT NULL)

) AS COL_ID

,COL.DATA.value('.[1]','VARCHAR(50)') AS COL_VAL

FROM BASE_DATA BD

OUTER APPLY BD.RDATA.nodes('td') AS COL(DATA)

)

SELECT

CD.ROW_ID

,MAX(CASE WHEN CD.COL_ID = 1 THEN CD.COL_VAL END) AS COL_01

,MAX(CASE WHEN CD.COL_ID = 2 THEN CD.COL_VAL END) AS COL_02

,MAX(CASE WHEN CD.COL_ID = 3 THEN CD.COL_VAL END) AS COL_03

,MAX(CASE WHEN CD.COL_ID = 4 THEN CD.COL_VAL END) AS COL_04

,MAX(CASE WHEN CD.COL_ID = 5 THEN CD.COL_VAL END) AS COL_05

,MAX(CASE WHEN CD.COL_ID = 6 THEN CD.COL_VAL END) AS COL_06

,MAX(CASE WHEN CD.COL_ID = 7 THEN CD.COL_VAL END) AS COL_07

,MAX(CASE WHEN CD.COL_ID = 8 THEN CD.COL_VAL END) AS COL_08

,MAX(CASE WHEN CD.COL_ID = 9 THEN CD.COL_VAL END) AS COL_09

,MAX(CASE WHEN CD.COL_ID = 10 THEN CD.COL_VAL END) AS COL_10

FROM COLUMN_DATA CD

GROUP BY CD.ROW_ID;

Results

ROW_ID COL_01 COL_02 COL_03 COL_04 COL_05 ...

------- ------- ------- ------- ------- -------...

1 cell1 cell2 cell3 NULL NULL ...

2 cell4 cell5 cell6 NULL NULL ...

3 cell7 cell8 cell9 NULL NULL ...

wall str33t SSC Veteran Points: 253 More actions · Answer 2

That's f***ing impressive, I have been working with various langs for more than 15 years and usually you can gauge how complex something is and how much code it might take to accomplish something. I was not expecting to see the code this complex for especially when you think of how easy it is what want I to do just by using in a foreach in xsl. Damn m$. Now I just to need figure exactly how this code is working, thanks for your help!

serg-52 SSCrazy Eights Points: 9913 More actions · Answer 3

🙂

Alternatively use OPENXML

DECLARE @xmlDocument nvarchar(max)

SET @xmlDocument = N'<table>

<tr>

<td>cell1</td>

<td>cell2</td>

<td>cell3</td>

</tr>

<tr>

<td>cell4</td>

<td>cell5</td>

<td>cell6</td>

</tr>

<tr>

<td>cell7</td>

<td>cell8</td>

<td>cell9</td>

</tr>

</table>';

DECLARE @docHandle int;

EXEC sp_xml_preparedocument @docHandle OUTPUT, @xmlDocument;

WITH rs1 AS (

SELECT cast(a.text as varchar(100)) [text]

, row_number() over (partition by b.parentid order by a.parentid) AS colID

, b.parentid AS rowid

FROM OPENXML(@docHandle, N'/table/tr') a

JOIN OPENXML(@docHandle, N'/table/tr') b

ON a.Localname='#text' and a.parentid = b.id

)

SELECT rowid, [1],[2],[3]

FROM rs1

PIVOT (max([text]) for colID in ([1],[2],[3])) as pv

Eirikur Eiriksson SSC Guru Points: 182918 More actions · Answer 4

serg-52 (10/30/2014)
🙂
Alternatively use OPENXML
DECLARE @xmlDocument nvarchar(max)
SET @xmlDocument = N'<table>
<tr>
<td>cell1</td>
<td>cell2</td>
<td>cell3</td>
</tr>
<tr>
<td>cell4</td>
<td>cell5</td>
<td>cell6</td>
</tr>
<tr>
<td>cell7</td>
<td>cell8</td>
<td>cell9</td>
</tr>
</table>';
DECLARE @docHandle int;
EXEC sp_xml_preparedocument @docHandle OUTPUT, @xmlDocument;
WITH rs1 AS (
SELECT cast(a.text as varchar(100)) [text]
, row_number() over (partition by b.parentid order by a.parentid) AS colID
, b.parentid AS rowid
FROM OPENXML(@docHandle, N'/table/tr') a
JOIN OPENXML(@docHandle, N'/table/tr') b
ON a.Localname='#text' and a.parentid = b.id
)
SELECT rowid, [1],[2],[3]
FROM rs1
PIVOT (max([text]) for colID in ([1],[2],[3])) as pv

Quick thought, I'm hesitant to recommend this approach as it performs worse, has some caveat and can possibly result in bogging the memory, i.e. I cannot see any sp_xml_removedocument statement in this code.

😎

wall str33t SSC Veteran Points: 253 More actions · Answer 5

Took your initial example and modified it so I could use a pivot with a dynamic set of cols, not sure how much of a hit I'm taking by using temp tables instead prob not significant enough to be a problem for me.

declare @txml xml = '<table>

<tr>

<td>cell1</td>

<td>cell2</td>

<td>cell3</td>

</tr>

<tr>

<td>cell4</td>

<td>cell5</td>

<td>cell6</td>

</tr>

<tr>

<td>cell7</td>

<td>cell8</td>

<td>cell9</td>

</tr>

</table>'

select row_number() over

(

partition by (select null)

order by (select null)

) as row_id

,tab.data.query('*') as rdata into #base_data

from @txml.nodes('table/tr') as tab(data)

select bd.row_id,row_number() over

(

partition by bd.row_id -- causes the row_number to reset for each row

order by (select null)

) as col_id

,col.data.value('.[1]','nvarchar(max)') as col_val -- selecting only 1 <td> node at a time

into #column_data

from base_data bd

outer apply bd.rdata.nodes('td') as col(data)

declare @cols as nvarchar(max), @prettycols as nvarchar(max), @query as nvarchar(max)

select @cols =

stuff((select ',' + quotename(col_id)

from #column_data cd

group by col_id

order by col_id

for xml path(''), type).value('.', 'nvarchar(max)'),1,1,'')

select @prettycols =

stuff((select ',' + quotename(col_id) + 'as Col' + cast(col_id as nvarchar)

from #column_data cd

group by col_id

order by col_id

for xml path(''), type).value('.', 'nvarchar(max)'),1,1,'')

set @query =

'select ' + @prettycols + ' from

(

select row_id, col_id, col_val

from #column_data

) x

pivot

(

max(col_val) for col_id in (' + @cols + ')

) p

'

execute(@query)

drop table #base_data

drop table #column_data

aaron.reese SSChampion Points: 13498 More actions · Answer 6

the OP said that the number of <TD>s could be 1..n.

A solution that limits the number of columns could result in sparsely populated columns and/or some of the data elements not getting processed.

Why not store the data in XML format and parse it on the way out of the database. If (as an exemplar) the data was polygon coordinates for geographic boundaries, then the only appropriate way to handle this would be to iterate through a collection in the downstream system. This may be a .NET IEnumeratable collection or a JSON object, or even the original XML.

If you do need to parse it on the way in to the database, then I would seriously suggest holding each data element in its own row with enough fields to identify its unique position in the original file

i.e.

ImcomingFileID, RecordID (this would be the <TR> sequence count, ColumnName (this would be the <TD> sequence count within the <TR>) and record value (the contents of the <TD>)

This does smack a bit of an EAV table, but it is the only way to ensure that all data is captured.

You can then convert this to the required format with n columns using the PIVOT() function (and some horrid dynamic SQL)