September 18, 2013 at 9:26 am
How can I get data out of this HTML table into a new SQL server table.
This is a sample HTML code of a table, I have some HTML files with 400000 rows with 48 columns.
I have never worked on a HTML file before.
<html>
<head>
<META content="text/html;charset=Cp1252">
</head>
<body>
<table border>
<tr><th>RESPrimaryId</th><th>ResultCode</th><th>Description</th><th>Abbreviation</th><th>Notes</th><th>LastUpdateTime</th><th>LastUpdateUser</th><th>CreatedTime</th><th>CreatedUser</th></tr>
<tr><td>1000000001</td><td>1</td><td>Cleaned / Lubricated</td><td>1</td><td></td><td>2013-09-09 19:00:00.139</td><td>DBA</td><td>2006-11-03 15:03:41.593</td><td>Installer</td></tr>
<tr><td>1000000002</td><td>2</td><td>Performance Test</td><td>2</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.640</td><td>Installer</td></tr>
<tr><td>1000000003</td><td>3</td><td>Adjusted / Calibrated</td><td>3</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.656</td><td>Installer</td></tr>
<tr><td>1000000004</td><td>4</td><td>Removed from Service</td><td>4</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.656</td><td>Installer</td></tr>
<tr><td>1000000005</td><td>5</td><td>Repair/Tested/Return Svc</td><td>5</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.671</td><td>Installer</td></tr>
<tr><td>1000000006</td><td>6</td><td>Performed Inservice</td><td>6</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.671</td><td>Installer</td></tr>
<tr><td>1000000007</td><td>7</td><td>Completed as Requested</td><td>7</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.671</td><td>Installer</td></tr>
<tr><td>1000000008</td><td>8</td><td>Serviced by Vendor</td><td>8</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.671</td><td>Installer</td></tr>
<tr><td>1000000009</td><td>9</td><td>Transferred Job</td><td>9</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.687</td><td>Installer</td></tr>
<tr><td>1000000010</td><td>10</td><td>Not Found/Unavailable</td><td>10</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.687</td><td>Installer</td></tr>
<tr><td>1000000072</td><td>102</td><td>REPAIR FROM PM</td><td>RFP</td><td>REPAIR THAT IS COMPLETED FROM A FAULT FOUND DURING PM</td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2007-03-16 09:35:48.588</td><td>DBA</td></tr>
</table>
</body>
</html>
September 18, 2013 at 9:37 am
What are you trying to do with this? Are you wanting to parse individual values or store the html string? Some clarity in your question will go a long way towards a decent response.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 18, 2013 at 9:46 am
I'm sorry about the confusion.
I was trying to extract the data(individual values) from the HTML to a Sql table.
September 18, 2013 at 10:13 am
Can you provide the ddl for the target table?
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 18, 2013 at 10:20 am
I am not sure what you mean by ddl forthe target table. 🙁
September 18, 2013 at 10:32 am
knakka99 (9/18/2013)
I am not sure what you mean by ddl forthe target table. 🙁
ddl = data definition language
In other words, the create table statement.
something like:
Create table #MyTarget
(
SomeColumn varchar(25),
etc...
)
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 18, 2013 at 10:46 am
CREATE TABLE [dbo].[DBA_Codes_Result](
[RESPrimaryId] [nvarchar](255) NULL,
[ResultCode] [smallint] NULL,
[Description] [nvarchar](25) NULL,
[Abbreviation] [nvarchar](7) NULL,
[Notes] [nvarchar](250) NULL,
[LastUpdateTime] [datetime] NULL,
[LastUpdateUser] [nvarchar](60) NULL,
[CreatedTime] [datetime] NULL,
[CreatedUser] [nvarchar](60) NULL
) ON [PRIMARY]
September 18, 2013 at 12:52 pm
This works for the sample data you provided. There is no guarantee this will continue to work if the format of the html changes in the future. You may have to tweak some of string manipulation to work with your real code.
declare @html varchar(max) = '<html>
<head>
<META content="text/html;charset=Cp1252">
</head>
<body>
<table border>
<tr><th>RESPrimaryId</th><th>ResultCode</th><th>Description</th><th>Abbreviation</th><th>Notes</th><th>LastUpdateTime</th><th>LastUpdateUser</th><th>CreatedTime</th><th>CreatedUser</th></tr>
<tr><td>1000000001</td><td>1</td><td>Cleaned / Lubricated</td><td>1</td><td></td><td>2013-09-09 19:00:00.139</td><td>DBA</td><td>2006-11-03 15:03:41.593</td><td>Installer</td></tr>
<tr><td>1000000002</td><td>2</td><td>Performance Test</td><td>2</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.640</td><td>Installer</td></tr>
<tr><td>1000000003</td><td>3</td><td>Adjusted / Calibrated</td><td>3</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.656</td><td>Installer</td></tr>
<tr><td>1000000004</td><td>4</td><td>Removed from Service</td><td>4</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.656</td><td>Installer</td></tr>
<tr><td>1000000005</td><td>5</td><td>Repair/Tested/Return Svc</td><td>5</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.671</td><td>Installer</td></tr>
<tr><td>1000000006</td><td>6</td><td>Performed Inservice</td><td>6</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.671</td><td>Installer</td></tr>
<tr><td>1000000007</td><td>7</td><td>Completed as Requested</td><td>7</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.671</td><td>Installer</td></tr>
<tr><td>1000000008</td><td>8</td><td>Serviced by Vendor</td><td>8</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.671</td><td>Installer</td></tr>
<tr><td>1000000009</td><td>9</td><td>Transferred Job</td><td>9</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.687</td><td>Installer</td></tr>
<tr><td>1000000010</td><td>10</td><td>Not Found/Unavailable</td><td>10</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.687</td><td>Installer</td></tr>
<tr><td>1000000072</td><td>102</td><td>REPAIR FROM PM</td><td>RFP</td><td>REPAIR THAT IS COMPLETED FROM A FAULT FOUND DURING PM</td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2007-03-16 09:35:48.588</td><td>DBA</td></tr>
</table>
</body>
</html>'
select @html = replace(replace(replace(replace(SUBSTRING(@html, CHARINDEX('CreatedUser</th></tr>', @html) + 21, CHARINDEX('</table>', @html)), '</td>', '<'), '</tr>', '>'), '<tr>', ''), '<td>', '')
select @html = replace(replace(@html, char(13), ''), char(10), '')
select @html = LEFT(@html, len(@html) - 25)
insert DBA_Codes_Result
select
MAX(case when x.ItemNumber = 1 then replace(replace(x.Item, char(13), ''), char(10), '') end) as RESPrimaryId,
MAX(case when x.ItemNumber = 2 then replace(replace(x.Item, char(13), ''), char(10), '') end) as ResultCode,
MAX(case when x.ItemNumber = 3 then replace(replace(x.Item, char(13), ''), char(10), '') end) as Description,
MAX(case when x.ItemNumber = 4 then replace(replace(x.Item, char(13), ''), char(10), '') end) as Abbreviation,
MAX(case when x.ItemNumber = 5 then replace(replace(x.Item, char(13), ''), char(10), '') end) as Notes,
MAX(case when x.ItemNumber = 6 then replace(replace(x.Item, char(13), ''), char(10), '') end) as LastUpdateTime,
MAX(case when x.ItemNumber = 7 then replace(replace(x.Item, char(13), ''), char(10), '') end) as LastUpdateUser,
MAX(case when x.ItemNumber = 8 then replace(replace(x.Item, char(13), ''), char(10), '') end) as CreatedTime,
MAX(case when x.ItemNumber = 9 then replace(replace(x.Item, char(13), ''), char(10), '') end) as CreatedUser
from dbo.DelimitedSplit8K(@html, '>') s
cross apply dbo.DelimitedSplit8K(s.Item, '<') x
where s.ItemNumber > ''
group by s.ItemNumber
select * from DBA_Codes_Result
You can find the DelimitedSplit8K function by following the link in my signature about splitting strings.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 19, 2013 at 6:34 am
Thank you.. I appreciate it. 🙂
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply