Jump to content

Fastest way to read webpages?


zssz

Recommended Posts

I am currently making a script that will read a webpage and extract certain data from it. I know how to read from a webpage and parse the data, but I cannot seem to find a fast way to obtain this data.

 

I am cycling through ~5000 pages collecting data, it takes about a second per page to locate the data, parse it, and then print it out.

 

Right now I am using cURL to connect to the website.

Is it possible to get this time down by using something other than cURL?

 

Ideally, I want to be able to write this data to a file (or eventually a database) every hour or 2. With the current method, it would take about 80-85 minutes to collect the data, which seems impractical.

Link to comment
Share on other sites

I think you need to make requests in parallel to speed things up.  For example, you could have 3 copies of your script.

 

Script 1: Processes urls 0, 3, 6, 9, ...

Script 2: Processes urls 1, 4, 7, 10, ...

Script 3: Processes urls 2, 5, 8, 11, ...

 

Then you'll get things done in 1/3 the time, as you have 3 requests active at any one time.

 

There may be an interface in php to do this within one script, but I have never used such a thing.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.