November 13, 2003 at 6:16 pm
Hi,
$150,000 sounds a lot of money!
From VB add the MSINET.OCX
Add the INET control to the page as Inet1.
put the following code in the form onLoad event or under a button.
Inet1.URL = "http://www.microsoft.com" 'replace with your url
content = Inet1.OpenURL
'wait until all of URL is read before trying to save it
Do Until Inet1.StillExecuting = False
Loop
'save the URL to a file
Open "C:\mydir\myfile.txt" For Output As #1
Print #1, content
Close #1
Form1.Caption = "Complete"
End
You could easily modify the code and have it loop through a recordset containing all the URL's to collect stored in a table in your database.
Assuming that the pages are HTML rather than XML; I would be more concerned with the parsing of the data once you have downloaded the pages. How can you guarentee that the sites in question will not make changes to the design/layout of their pages that render your data extraction useless?
HTH
Chris
November 14, 2003 at 2:11 am
quote:
Try searching for something called Black Widow. It's sort of a many-purposed piece of software and it might be free. It can map web sites, download entire web sites or just specific types of files, links or email addresses.(I heard about this from a friend... Really! I've never used it! Really!)
Yes, Black Widow was one of the apps I had in mind when I first responsed to this thread.
Shouldn't be difficult to find, however be aware of the *might be* implication by using such a tool. Using this from office in the name of a company is one of the things only a fool would do.
planet115's Excel sheet is really nice and easy to understand and I guess fully sufficient for the given task.
Frank
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
November 14, 2003 at 3:03 am
Thanks for all the replies.
I've had a look at Planet115's excel app and it seems to do what I need - and it's legal.
There is nothing sinister in what I am trying to do. I am just trying to replace the key strokes that a human does with ones controlled by a PC.
Jeremy
November 14, 2003 at 4:18 am
>>There is nothing sinister in what I am trying to do.
I'm sure! Perhaps the point being made is that it is *possible* to abuse such a tool. Given what you are trying to achieve, this would certainly be by accident rather than malicious intent or lack of consideration for others.
It's good manners to make sure that your spider will back off for a reasonable time if it doesn't get the requested resource, and to limit the maximum number of retries to something reasonable.
If possible, test the tool against your own webserver first.
August 29, 2006 at 10:46 am
planet115,
RE: I have VBA code which performs HTTP GET and POST operations by calling the WinInet dll. You could run it directly from Excel and integrate it with your scraper macros. Let me know if you want it and I will email for you.
Can you send me a copy of the VBA code you use for screan scraping?
Thanks,
John
Viewing 5 posts - 16 through 19 (of 19 total)
You must be logged in to reply to this topic. Login to reply