October 8, 2014 at 1:53 pm
Hello,
I have someone who is sending me .htm documents, with a table in them, and I was wondering if there is a way to import the data from those tables into a SQL table, probably using an SSIS Package.
Thanks,
Sam
October 8, 2014 at 9:27 pm
October 9, 2014 at 1:48 pm
1. Phil's article is excellent; I never read it until today after I saw that Erikur posted it.
2. Second, can you post and example of the HTML files that you are getting?
3. There is an SSIS XSLT task that you could leverage but that is a little more complicated than we need to get...
Lastly, I have to do this from time-to-time in SSIS and the solution varies depending on the condition of the HTML file you are getting. Keeping in mind that, beginning with HTML 4.01, HTML files were required to be well-formed. Though HTML existed before XML, you can now think of HTML files (4.01+) as XML files.
This HTML code:
<html>
<header><title>Test</title></header>
<body>
<h1>blah blah</h1>
<table border="1" cellspacing="0" style="border: black 1px solid">
<tr><th>Date</th><th>Sales</th></tr>
<tr><td>1/1/2003</td><td>12.09</td></tr>
<tr><td>1/3/2003</td><td>85.09</td></tr>
</table>
<p>blah blah</p>
</body>
</html>
Will produce a page that looks like this:
Note the table, that's where the information is that we are looking for...
Now we'll cast it as XML and query it like this:
-- sample html data
DECLARE @html xml =
'
<html>
<header><title>Test</title></header>
<body>
<h1>blah blah</h1>
<table border="1" cellspacing="0" style="border: black 1px solid">
<tr><th>Date</th><th>Sales</th></tr>
<tr><td>1/1/2003</td><td>12.09</td></tr>
<tr><td>1/3/2003</td><td>85.09</td></tr>
</table>
<p>blah blah</p>
</body>
</html>
'
-- code to get extract data from the HTML
SELECTsale_date = xx.value('(./td/text())[1]','varchar(100)'),
sale_amt = xx.value('(./td/text())[2]','varchar(100)')
FROM (values(@html)) t1(x)
CROSS APPLY x.nodes('//table[1]/tr[position()>1]') t2(xx);
Which produces this:
sale_datesale_amt
------------------------
1/1/200312.09
1/3/200385.09
You can use this as a template for understanding how to extract data from an HTML table.
-- Itzik Ben-Gan 2001
October 9, 2014 at 2:08 pm
Nice job Alan!
😎
October 9, 2014 at 2:10 pm
Eirikur Eiriksson (10/9/2014)
Nice job Alan!😎
Thanks!
-- Itzik Ben-Gan 2001
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply