October 29, 2014 at 2:16 pm
I got some xml that is essentially an html table that I need to turn into a standard table resultset from a stored proc. If you take this xml and save it as html that is the desired resultset I am looking for. I realize the <td> tags repeat so I would just prefer 'col' + positional index for the col name. Keep in mind that <td> could be is 1 to n. In the end if anyone can point me to some extensive xml Data Type Methods documentation that would be great.
<table>
<tr>
<td>cell1</td>
<td>cell2</td>
<td>cell3</td>
</tr>
<tr>
<td>cell4</td>
<td>cell5</td>
<td>cell6</td>
</tr>
<tr>
<td>cell7</td>
<td>cell8</td>
<td>cell9</td>
</tr>
</table>
This is my attempt but I can't figure out how to get separate cols
declare @GridData xml = '<table><tr><td>cell1</td><td>cell2</td><td>cell3</td></tr><tr><td>cell4</td><td>cell5</td><td>cell6</td></tr><tr><td>cell7</td><td>cell8</td><td>cell8</td></tr></table>'
select T.C.value('.', 'nvarchar(max)')
from @GridData.nodes('//tr') T(C)
October 29, 2014 at 2:37 pm
A quick solution, should get you passed this hurdle
😎
USE tempdb;
GO
SET NOCOUNT ON;
DECLARE @TXML XML = '<table>
<tr>
<td>cell1</td>
<td>cell2</td>
<td>cell3</td>
</tr>
<tr>
<td>cell4</td>
<td>cell5</td>
<td>cell6</td>
</tr>
<tr>
<td>cell7</td>
<td>cell8</td>
<td>cell9</td>
</tr>
</table>';
;WITH BASE_DATA AS
(
SELECT
ROW_NUMBER() OVER
(
PARTITION BY (SELECT NULL)
ORDER BY (SELECT NULL)
) AS ROW_ID
,TAB.DATA.query('*') AS RDATA
FROM @TXML.nodes('table/tr') AS TAB(DATA)
)
,COLUMN_DATA AS
(
SELECT
BD.ROW_ID
,ROW_NUMBER() OVER
(
PARTITION BY BD.ROW_ID
ORDER BY (SELECT NULL)
) AS COL_ID
,COL.DATA.value('.[1]','VARCHAR(50)') AS COL_VAL
FROM BASE_DATA BD
OUTER APPLY BD.RDATA.nodes('td') AS COL(DATA)
)
SELECT
CD.ROW_ID
,MAX(CASE WHEN CD.COL_ID = 1 THEN CD.COL_VAL END) AS COL_01
,MAX(CASE WHEN CD.COL_ID = 2 THEN CD.COL_VAL END) AS COL_02
,MAX(CASE WHEN CD.COL_ID = 3 THEN CD.COL_VAL END) AS COL_03
,MAX(CASE WHEN CD.COL_ID = 4 THEN CD.COL_VAL END) AS COL_04
,MAX(CASE WHEN CD.COL_ID = 5 THEN CD.COL_VAL END) AS COL_05
,MAX(CASE WHEN CD.COL_ID = 6 THEN CD.COL_VAL END) AS COL_06
,MAX(CASE WHEN CD.COL_ID = 7 THEN CD.COL_VAL END) AS COL_07
,MAX(CASE WHEN CD.COL_ID = 8 THEN CD.COL_VAL END) AS COL_08
,MAX(CASE WHEN CD.COL_ID = 9 THEN CD.COL_VAL END) AS COL_09
,MAX(CASE WHEN CD.COL_ID = 10 THEN CD.COL_VAL END) AS COL_10
FROM COLUMN_DATA CD
GROUP BY CD.ROW_ID;
Results
ROW_ID COL_01 COL_02 COL_03 COL_04 COL_05 ...
------- ------- ------- ------- ------- -------...
1 cell1 cell2 cell3 NULL NULL ...
2 cell4 cell5 cell6 NULL NULL ...
3 cell7 cell8 cell9 NULL NULL ...
October 29, 2014 at 3:01 pm
That's f***ing impressive, I have been working with various langs for more than 15 years and usually you can gauge how complex something is and how much code it might take to accomplish something. I was not expecting to see the code this complex for especially when you think of how easy it is what want I to do just by using in a foreach in xsl. Damn m$. Now I just to need figure exactly how this code is working, thanks for your help!
October 30, 2014 at 7:06 am
🙂
Alternatively use OPENXML
DECLARE @xmlDocument nvarchar(max)
SET @xmlDocument = N'<table>
<tr>
<td>cell1</td>
<td>cell2</td>
<td>cell3</td>
</tr>
<tr>
<td>cell4</td>
<td>cell5</td>
<td>cell6</td>
</tr>
<tr>
<td>cell7</td>
<td>cell8</td>
<td>cell9</td>
</tr>
</table>';
DECLARE @docHandle int;
EXEC sp_xml_preparedocument @docHandle OUTPUT, @xmlDocument;
WITH rs1 AS (
SELECT cast(a.text as varchar(100)) [text]
, row_number() over (partition by b.parentid order by a.parentid) AS colID
, b.parentid AS rowid
FROM OPENXML(@docHandle, N'/table/tr') a
JOIN OPENXML(@docHandle, N'/table/tr') b
ON a.Localname='#text' and a.parentid = b.id
)
SELECT rowid, [1],[2],[3]
FROM rs1
PIVOT (max([text]) for colID in ([1],[2],[3])) as pv
October 30, 2014 at 8:23 am
serg-52 (10/30/2014)
🙂Alternatively use OPENXML
DECLARE @xmlDocument nvarchar(max)
SET @xmlDocument = N'<table>
<tr>
<td>cell1</td>
<td>cell2</td>
<td>cell3</td>
</tr>
<tr>
<td>cell4</td>
<td>cell5</td>
<td>cell6</td>
</tr>
<tr>
<td>cell7</td>
<td>cell8</td>
<td>cell9</td>
</tr>
</table>';
DECLARE @docHandle int;
EXEC sp_xml_preparedocument @docHandle OUTPUT, @xmlDocument;
WITH rs1 AS (
SELECT cast(a.text as varchar(100)) [text]
, row_number() over (partition by b.parentid order by a.parentid) AS colID
, b.parentid AS rowid
FROM OPENXML(@docHandle, N'/table/tr') a
JOIN OPENXML(@docHandle, N'/table/tr') b
ON a.Localname='#text' and a.parentid = b.id
)
SELECT rowid, [1],[2],[3]
FROM rs1
PIVOT (max([text]) for colID in ([1],[2],[3])) as pv
Quick thought, I'm hesitant to recommend this approach as it performs worse, has some caveat and can possibly result in bogging the memory, i.e. I cannot see any sp_xml_removedocument statement in this code.
😎
October 30, 2014 at 9:14 am
Took your initial example and modified it so I could use a pivot with a dynamic set of cols, not sure how much of a hit I'm taking by using temp tables instead prob not significant enough to be a problem for me.
declare @txml xml = '<table>
<tr>
<td>cell1</td>
<td>cell2</td>
<td>cell3</td>
</tr>
<tr>
<td>cell4</td>
<td>cell5</td>
<td>cell6</td>
</tr>
<tr>
<td>cell7</td>
<td>cell8</td>
<td>cell9</td>
</tr>
</table>'
select row_number() over
(
partition by (select null)
order by (select null)
) as row_id
,tab.data.query('*') as rdata into #base_data
from @txml.nodes('table/tr') as tab(data)
select bd.row_id,row_number() over
(
partition by bd.row_id -- causes the row_number to reset for each row
order by (select null)
) as col_id
,col.data.value('.[1]','nvarchar(max)') as col_val -- selecting only 1 <td> node at a time
into #column_data
from base_data bd
outer apply bd.rdata.nodes('td') as col(data)
declare @cols as nvarchar(max), @prettycols as nvarchar(max), @query as nvarchar(max)
select @cols =
stuff((select ',' + quotename(col_id)
from #column_data cd
group by col_id
order by col_id
for xml path(''), type).value('.', 'nvarchar(max)'),1,1,'')
select @prettycols =
stuff((select ',' + quotename(col_id) + 'as Col' + cast(col_id as nvarchar)
from #column_data cd
group by col_id
order by col_id
for xml path(''), type).value('.', 'nvarchar(max)'),1,1,'')
set @query =
'select ' + @prettycols + ' from
(
select row_id, col_id, col_val
from #column_data
) x
pivot
(
max(col_val) for col_id in (' + @cols + ')
) p
'
execute(@query)
drop table #base_data
drop table #column_data
November 5, 2014 at 4:47 am
the OP said that the number of <TD>s could be 1..n.
A solution that limits the number of columns could result in sparsely populated columns and/or some of the data elements not getting processed.
Why not store the data in XML format and parse it on the way out of the database. If (as an exemplar) the data was polygon coordinates for geographic boundaries, then the only appropriate way to handle this would be to iterate through a collection in the downstream system. This may be a .NET IEnumeratable collection or a JSON object, or even the original XML.
If you do need to parse it on the way in to the database, then I would seriously suggest holding each data element in its own row with enough fields to identify its unique position in the original file
i.e.
ImcomingFileID, RecordID (this would be the <TR> sequence count, ColumnName (this would be the <TD> sequence count within the <TR>) and record value (the contents of the <TD>)
This does smack a bit of an EAV table, but it is the only way to ensure that all data is captured.
You can then convert this to the required format with n columns using the PIVOT() function (and some horrid dynamic SQL)
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply