zssz Posted January 9, 2009 Share Posted January 9, 2009 I am currently making a script that will read a webpage and extract certain data from it. I know how to read from a webpage and parse the data, but I cannot seem to find a fast way to obtain this data. I am cycling through ~5000 pages collecting data, it takes about a second per page to locate the data, parse it, and then print it out. Right now I am using cURL to connect to the website. Is it possible to get this time down by using something other than cURL? Ideally, I want to be able to write this data to a file (or eventually a database) every hour or 2. With the current method, it would take about 80-85 minutes to collect the data, which seems impractical. Link to comment https://forums.phpfreaks.com/topic/140108-fastest-way-to-read-webpages/ Share on other sites More sharing options...
btherl Posted January 9, 2009 Share Posted January 9, 2009 I think you need to make requests in parallel to speed things up. For example, you could have 3 copies of your script. Script 1: Processes urls 0, 3, 6, 9, ... Script 2: Processes urls 1, 4, 7, 10, ... Script 3: Processes urls 2, 5, 8, 11, ... Then you'll get things done in 1/3 the time, as you have 3 requests active at any one time. There may be an interface in php to do this within one script, but I have never used such a thing. Link to comment https://forums.phpfreaks.com/topic/140108-fastest-way-to-read-webpages/#findComment-733062 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.