zssz Posted January 9, 2009 Share Posted January 9, 2009 I am currently making a script that will read a webpage and extract certain data from it. I know how to read from a webpage and parse the data, but I cannot seem to find a fast way to obtain this data. I am cycling through ~5000 pages collecting data, it takes about a second per page to locate the data, parse it, and then print it out. Right now I am using cURL to connect to the website. Is it possible to get this time down by using something other than cURL? Ideally, I want to be able to write this data to a file (or eventually a database) every hour or 2. With the current method, it would take about 80-85 minutes to collect the data, which seems impractical. Quote Link to comment Share on other sites More sharing options...
btherl Posted January 9, 2009 Share Posted January 9, 2009 I think you need to make requests in parallel to speed things up. For example, you could have 3 copies of your script. Script 1: Processes urls 0, 3, 6, 9, ... Script 2: Processes urls 1, 4, 7, 10, ... Script 3: Processes urls 2, 5, 8, 11, ... Then you'll get things done in 1/3 the time, as you have 3 requests active at any one time. There may be an interface in php to do this within one script, but I have never used such a thing. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.