jim_de_bo Posted January 27, 2010 Share Posted January 27, 2010 Hi, I have written a script that opens a list of web pages and extracts data from the file and returns it back. I am running into difficulty as the list of web pages is increasing and it is taking too long to open and process all the files. I have tried to use pcntl_fork() but unfortunately the people that host my website don't support this. Any help/suggestions on how I can open and process these files simultaneously would be really useful. The code is : function get_info($url_array, $file_name){ foreach ($url_array as $page){ //loop every file name $new_file_name = $file_name . $page; //build full url to open $file = fopen($new_file_name, "r") or exit("Unable to open"); while(!feof($file)){ $line = fgets($file); //if its the data i need $data = // extract the data i need from file into a single string } fclose($file); $data_array = $data //save data in array } return $data_array; } $my_files = array("file1.txt", "file2.txt", "file3.txt"); $my_webpage = "http://www.mywebpage.com/"; get_info($my_files,$my_webpage); Quote Link to comment https://forums.phpfreaks.com/topic/189987-execute-multiple-process-without-pcntl_fork/ Share on other sites More sharing options...
salathe Posted January 27, 2010 Share Posted January 27, 2010 What are the major factors in the performance of your script, what is making it particularly slow: large files, lots of files, remote files (all of the above)? Quote Link to comment https://forums.phpfreaks.com/topic/189987-execute-multiple-process-without-pcntl_fork/#findComment-1002364 Share on other sites More sharing options...
jim_de_bo Posted January 27, 2010 Author Share Posted January 27, 2010 Hi, 2 main reasons for making it slow are the ammount of files (up to 300) and the fact they are remote. I did look at saving remote files locally as a one off but the remote files change on a regular basis making this impossilbe. Quote Link to comment https://forums.phpfreaks.com/topic/189987-execute-multiple-process-without-pcntl_fork/#findComment-1002385 Share on other sites More sharing options...
salathe Posted January 27, 2010 Share Posted January 27, 2010 Ah now the story unfolds. How happy would you be starting from scratch with a completely new approach? Unless there is any real reason for doing so, it would be much preferable to work only one remote file if possible (ideally, 300 different scripts all running at once): some form of task/queue framework would be good. If that sounds like too much hard work, you could also use cURL to fetch the files in batches (or say 50, 100, or even 300) in parallel and you can then be working some files while the rest are still being fetched: for this, look to using the curl_multi_* functions. Quote Link to comment https://forums.phpfreaks.com/topic/189987-execute-multiple-process-without-pcntl_fork/#findComment-1002403 Share on other sites More sharing options...
jim_de_bo Posted January 27, 2010 Author Share Posted January 27, 2010 I was thinking I might have to start from scratch as I seemd to have hit a dead end! I am aiming to do what you sugested, 300 scripts all running at the same time working on the different remote files. The trouble is I have no idea where to start with this. Could you point me in the right direction? Thanks a lot for your help. Quote Link to comment https://forums.phpfreaks.com/topic/189987-execute-multiple-process-without-pcntl_fork/#findComment-1002408 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.