Fastest way to read webpages?

zssz · January 9, 2009

I am currently making a script that will read a webpage and extract certain data from it. I know how to read from a webpage and parse the data, but I cannot seem to find a fast way to obtain this data.

I am cycling through ~5000 pages collecting data, it takes about a second per page to locate the data, parse it, and then print it out.

Right now I am using cURL to connect to the website.

Is it possible to get this time down by using something other than cURL?

Ideally, I want to be able to write this data to a file (or eventually a database) every hour or 2. With the current method, it would take about 80-85 minutes to collect the data, which seems impractical.

btherl · January 9, 2009

I think you need to make requests in parallel to speed things up. For example, you could have 3 copies of your script.

Script 1: Processes urls 0, 3, 6, 9, ...

Script 2: Processes urls 1, 4, 7, 10, ...

Script 3: Processes urls 2, 5, 8, 11, ...

Then you'll get things done in 1/3 the time, as you have 3 requests active at any one time.

There may be an interface in php to do this within one script, but I have never used such a thing.

Sign In

Fastest way to read webpages?

Recommended Posts

zssz

Link to comment

Share on other sites

btherl

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information