February 26, 2010 at 3:01 am
I would like to be able import data from the following html page into a sql 2008 database table:
http://en.kingofsat.net/pack-skydigital.php
with the following columns:
Name Country Category Packages Encryption(s) SID VPID Audio PMT PCR TXT Last updated
which you can see are across the top.
Just wondering if anyone out there as done anything like this before? Maybe in SSIS? Not sure of the best way to try and strip out the data from a page in this format. There doesnt seem to be any kind of structure I can follow.
I would normally using SSIS for my ETL tasks but this one has me a bit stumped :crazy:
February 26, 2010 at 5:59 am
I had once developed an application which tears data from html pages.
I created a VB.NET application which calls the target URL and gets the response.
When you get the application I parsed the html text for the data that I want.
This requires some hardcoding, you should know the table id if exists, or you should keep some static text for identfying the data.
Eralper
February 26, 2010 at 6:12 am
Maybe you can use excel in the middle, since it extracts data from html tables.
Excel 2007: Ribbon Data, From Web.
February 26, 2010 at 6:16 am
Hello Skull,
Do you have a reference link for a sample?
I know that Excel2007 has many functionalities like data mining components, etc but I did not hear about getting data from HTML tables.
Thanks for the information,
Eralper
February 26, 2010 at 6:35 am
just google it with "excel 2007 import html table".
February 26, 2010 at 6:40 am
Thanks for the link, great!
Some MS Office components are so powerful that there is not a matching utility in SQL Server.
I experienced such a situation on SSRS once. The graph component in MS Office was capable of what I was trying to do on the other hand SSRS failed to manage.
February 26, 2010 at 7:37 am
Ah! Excellent.....
In Excell 2003 I said Data->Import External Data->New Web Query... then put in the http://en.kingofsat.net/pack-skydigital.php URL and hit Go.
It imported everything at least into individual columns - so now I should be able to work some magic in SSIS and get this imported into a database table.
Thanks for your help - never even knew this function existed in Excel!:w00t:
February 26, 2010 at 7:39 am
Yes Web Query exists in Excel2003 too.
I have just made a sample query.
But is there a way that I can get the code running behind?
February 26, 2010 at 7:46 am
Not too sure how you would capture the code....
I'll be working on this for the next while - If I get a SSIS package to take care of all the processing I'll post it up....
February 26, 2010 at 7:52 am
I have used screen scraper http://www.velocityscape.com for this sort of problem int the past. It's fairly straight forward to use - you define a template, and your output layout, and away it goes. I think there is a trial version. Good luck.
Pete Clements
February 26, 2010 at 8:05 am
I have configured the HTTP Connection Manager in SSIS to connect to the http://en.kingofsat.net/pack-skydigital.php URL.
Now I wonder if there is a way to work this into a Data Flow Task to extract the html table :unsure:
February 26, 2010 at 9:18 am
Have you tried xpath?
February 26, 2010 at 5:19 pm
Hi
Don't know if you tried xpath. I have been trying, but the document is not a well formatted xml. AT least that's what sql sever xml type is saying, w3c, and other tools. Do you know another webpage with the same information?
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply