Jump to content

Recommended Posts

Hi,

 

I have written a script that opens a list of web pages and extracts data from the file and returns it back. I am running into difficulty as the list of web pages is increasing and it is taking too long to open and process all the files.

 

I have tried to use pcntl_fork() but unfortunately the people that host my website don't support this. Any help/suggestions on how I can open and process these files simultaneously would be really useful. The code is :

 

function get_info($url_array, $file_name){

 

 

foreach ($url_array as $page){ //loop every file name

 

$new_file_name = $file_name . $page; //build full url to open

 

$file = fopen($new_file_name, "r") or exit("Unable to open");

 

while(!feof($file)){

$line = fgets($file);

//if its the data i need

$data = // extract the data i need from file into a single string

 

}

 

fclose($file);

 

$data_array = $data //save data in array

 

}

 

return $data_array;

 

}

 

 

$my_files = array("file1.txt", "file2.txt", "file3.txt");

$my_webpage = "http://www.mywebpage.com/";

 

get_info($my_files,$my_webpage);

 

 

Hi,

 

2 main reasons for making it slow are the ammount of files (up to 300) and the fact they are remote.

 

I did look at saving remote files locally as a one off but the remote files change on a regular basis making this impossilbe.

 

Ah now the story unfolds.  How happy would you be starting from scratch with a completely new approach? Unless there is any real reason for doing so, it would be much preferable to work only one remote file if possible (ideally, 300 different scripts all running at once): some form of task/queue framework would be good. If that sounds like too much hard work, you could also use cURL to fetch the files in batches (or say 50, 100, or even 300) in parallel and you can then be working some files while the rest are still being fetched: for this, look to using the curl_multi_* functions.

I was thinking I might have to start from scratch as I seemd to have hit a dead end!

 

I am aiming to do what you sugested, 300 scripts all running at the same time working on the different remote files. The trouble is I have no idea where to start with this. Could you point me in the right direction?

 

Thanks a lot for your help.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.