html file to sql server table

  • How can I get data out of this HTML table into a new SQL server table.

    This is a sample HTML code of a table, I have some HTML files with 400000 rows with 48 columns.

    I have never worked on a HTML file before.

    <html>

    <head>

    <META content="text/html;charset=Cp1252">

    </head>

    <body>

    <table border>

    <tr><th>RESPrimaryId</th><th>ResultCode</th><th>Description</th><th>Abbreviation</th><th>Notes</th><th>LastUpdateTime</th><th>LastUpdateUser</th><th>CreatedTime</th><th>CreatedUser</th></tr>

    <tr><td>1000000001</td><td>1</td><td>Cleaned / Lubricated</td><td>1</td><td></td><td>2013-09-09 19:00:00.139</td><td>DBA</td><td>2006-11-03 15:03:41.593</td><td>Installer</td></tr>

    <tr><td>1000000002</td><td>2</td><td>Performance Test</td><td>2</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.640</td><td>Installer</td></tr>

    <tr><td>1000000003</td><td>3</td><td>Adjusted / Calibrated</td><td>3</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.656</td><td>Installer</td></tr>

    <tr><td>1000000004</td><td>4</td><td>Removed from Service</td><td>4</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.656</td><td>Installer</td></tr>

    <tr><td>1000000005</td><td>5</td><td>Repair/Tested/Return Svc</td><td>5</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.671</td><td>Installer</td></tr>

    <tr><td>1000000006</td><td>6</td><td>Performed Inservice</td><td>6</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.671</td><td>Installer</td></tr>

    <tr><td>1000000007</td><td>7</td><td>Completed as Requested</td><td>7</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.671</td><td>Installer</td></tr>

    <tr><td>1000000008</td><td>8</td><td>Serviced by Vendor</td><td>8</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.671</td><td>Installer</td></tr>

    <tr><td>1000000009</td><td>9</td><td>Transferred Job</td><td>9</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.687</td><td>Installer</td></tr>

    <tr><td>1000000010</td><td>10</td><td>Not Found/Unavailable</td><td>10</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.687</td><td>Installer</td></tr>

    <tr><td>1000000072</td><td>102</td><td>REPAIR FROM PM</td><td>RFP</td><td>REPAIR THAT IS COMPLETED FROM A FAULT FOUND DURING PM</td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2007-03-16 09:35:48.588</td><td>DBA</td></tr>

    </table>

    </body>

    </html>

  • What are you trying to do with this? Are you wanting to parse individual values or store the html string? Some clarity in your question will go a long way towards a decent response.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I'm sorry about the confusion.

    I was trying to extract the data(individual values) from the HTML to a Sql table.

  • Can you provide the ddl for the target table?

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I am not sure what you mean by ddl forthe target table. 🙁

  • knakka99 (9/18/2013)


    I am not sure what you mean by ddl forthe target table. 🙁

    ddl = data definition language

    In other words, the create table statement.

    something like:

    Create table #MyTarget

    (

    SomeColumn varchar(25),

    etc...

    )

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • CREATE TABLE [dbo].[DBA_Codes_Result](

    [RESPrimaryId] [nvarchar](255) NULL,

    [ResultCode] [smallint] NULL,

    [Description] [nvarchar](25) NULL,

    [Abbreviation] [nvarchar](7) NULL,

    [Notes] [nvarchar](250) NULL,

    [LastUpdateTime] [datetime] NULL,

    [LastUpdateUser] [nvarchar](60) NULL,

    [CreatedTime] [datetime] NULL,

    [CreatedUser] [nvarchar](60) NULL

    ) ON [PRIMARY]

  • This works for the sample data you provided. There is no guarantee this will continue to work if the format of the html changes in the future. You may have to tweak some of string manipulation to work with your real code.

    declare @html varchar(max) = '<html>

    <head>

    <META content="text/html;charset=Cp1252">

    </head>

    <body>

    <table border>

    <tr><th>RESPrimaryId</th><th>ResultCode</th><th>Description</th><th>Abbreviation</th><th>Notes</th><th>LastUpdateTime</th><th>LastUpdateUser</th><th>CreatedTime</th><th>CreatedUser</th></tr>

    <tr><td>1000000001</td><td>1</td><td>Cleaned / Lubricated</td><td>1</td><td></td><td>2013-09-09 19:00:00.139</td><td>DBA</td><td>2006-11-03 15:03:41.593</td><td>Installer</td></tr>

    <tr><td>1000000002</td><td>2</td><td>Performance Test</td><td>2</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.640</td><td>Installer</td></tr>

    <tr><td>1000000003</td><td>3</td><td>Adjusted / Calibrated</td><td>3</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.656</td><td>Installer</td></tr>

    <tr><td>1000000004</td><td>4</td><td>Removed from Service</td><td>4</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.656</td><td>Installer</td></tr>

    <tr><td>1000000005</td><td>5</td><td>Repair/Tested/Return Svc</td><td>5</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.671</td><td>Installer</td></tr>

    <tr><td>1000000006</td><td>6</td><td>Performed Inservice</td><td>6</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.671</td><td>Installer</td></tr>

    <tr><td>1000000007</td><td>7</td><td>Completed as Requested</td><td>7</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.671</td><td>Installer</td></tr>

    <tr><td>1000000008</td><td>8</td><td>Serviced by Vendor</td><td>8</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.671</td><td>Installer</td></tr>

    <tr><td>1000000009</td><td>9</td><td>Transferred Job</td><td>9</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.687</td><td>Installer</td></tr>

    <tr><td>1000000010</td><td>10</td><td>Not Found/Unavailable</td><td>10</td><td></td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2006-11-03 15:03:41.687</td><td>Installer</td></tr>

    <tr><td>1000000072</td><td>102</td><td>REPAIR FROM PM</td><td>RFP</td><td>REPAIR THAT IS COMPLETED FROM A FAULT FOUND DURING PM</td><td>2013-09-09 19:00:00.154</td><td>DBA</td><td>2007-03-16 09:35:48.588</td><td>DBA</td></tr>

    </table>

    </body>

    </html>'

    select @html = replace(replace(replace(replace(SUBSTRING(@html, CHARINDEX('CreatedUser</th></tr>', @html) + 21, CHARINDEX('</table>', @html)), '</td>', '<'), '</tr>', '>'), '<tr>', ''), '<td>', '')

    select @html = replace(replace(@html, char(13), ''), char(10), '')

    select @html = LEFT(@html, len(@html) - 25)

    insert DBA_Codes_Result

    select

    MAX(case when x.ItemNumber = 1 then replace(replace(x.Item, char(13), ''), char(10), '') end) as RESPrimaryId,

    MAX(case when x.ItemNumber = 2 then replace(replace(x.Item, char(13), ''), char(10), '') end) as ResultCode,

    MAX(case when x.ItemNumber = 3 then replace(replace(x.Item, char(13), ''), char(10), '') end) as Description,

    MAX(case when x.ItemNumber = 4 then replace(replace(x.Item, char(13), ''), char(10), '') end) as Abbreviation,

    MAX(case when x.ItemNumber = 5 then replace(replace(x.Item, char(13), ''), char(10), '') end) as Notes,

    MAX(case when x.ItemNumber = 6 then replace(replace(x.Item, char(13), ''), char(10), '') end) as LastUpdateTime,

    MAX(case when x.ItemNumber = 7 then replace(replace(x.Item, char(13), ''), char(10), '') end) as LastUpdateUser,

    MAX(case when x.ItemNumber = 8 then replace(replace(x.Item, char(13), ''), char(10), '') end) as CreatedTime,

    MAX(case when x.ItemNumber = 9 then replace(replace(x.Item, char(13), ''), char(10), '') end) as CreatedUser

    from dbo.DelimitedSplit8K(@html, '>') s

    cross apply dbo.DelimitedSplit8K(s.Item, '<') x

    where s.ItemNumber > ''

    group by s.ItemNumber

    select * from DBA_Codes_Result

    You can find the DelimitedSplit8K function by following the link in my signature about splitting strings.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thank you.. I appreciate it. 🙂

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply