Search the Community
Showing results for tags 'large request'.
-
I am having a heck of a time trying to process a large cURL request. I keep running into issues with the mysql server timing out and also using the callback function within the cURL script (see below). What I am attempting to do is to utilize cURL to log a user into a system (*due to legality issues I cannot specify which) and pull all of their work for the day. I have been successful at pulling all of the work, but each order contains multiple sub-items, each with a specific url. For instance, 300 work orders would translate to approximately 2000 sub-items. Pulling the 300 work order takes approximately 1.6 minutes. For some reason just pulling 10 sub-items is taking in upwards of 3 minutes. After hundreds (and I am not exaggerating) of attempts I have finally decided to reach out to see if someone can take a look at my script and offer some knowledge. Here is the process from a logic standpoint: Pull all user login data from database and log them into the system through cURL (*Works fine) Request all activity and customer information and Insert into database (*Works fine) Get all sub-items and insert them into the database (*ISSUES) Here is the process from a script standpoint: User clicks "Import" button which sends AJAX request to run importWork PHP function. This function only handles requesting the activity and customer information through cURL. (Due to the amount of time it takes for the sub-items to process I have broken up the process). importWork function returns via jSON the number of work orders processed. ***In testing I have also had the importWork function store the urls for all of the sub-items to my database. The only issue is that the logins will start to timeout (Not on my server but the server I am pulling the data from) before all the sub-items can process. javascript automatically sends another AJAX request to pull all of the sub-items. I am using a cURL Multi function to process the url requests. The function will return an array containing the html for each of the urls. I then parse the html to search for the underlying hrefs I need to access the workorders, customer information, and sub-items. So overall, my question is, what is the best way to handle a large cURL request of 2000 urls? Below you will see the rolling_curl function which I am attempting to use to handle the line items. For some reason it doesnt work at all. What I would like to do is simply send an array of urls to the rolling_curl function and have it request all the html for each url. Once a url is finished processing it should run the callback script to insert the data into the database. I figured it would be the best way to handle such a large request in a timely manner. ROLLING CURL FUNCTION: explanation: A function will put all sub-item urls and the corresponding activity ids into an associative array and pass it to the rolling_curl function. The callback function will parse the html and insert the needed data into the database. The only thing this function is doing at this time is dumping "Failed". I have ran the script using the same urls through the standard cURL multi function (See Below) and verified it is pulling the html (So it isn't an issue with the urls). public function rolling_curl($urldata, $callback = null, $custom_options = null) { set_time_limit(0); //extract data from $urldata $urls = $urldata['urls']; $activities = $urldata['activities']; // make sure the rolling window isn't greater than the # of urls $rolling_window = 95; $rolling_window = (sizeof($urls) < $rolling_window) ? sizeof($urls) : $rolling_window; $master = curl_multi_init(); $curl_arr = array(); // add additional curl options here $std_options = array(CURLOPT_RETURNTRANSFER => true, CURLOPT_FOLLOWLOCATION => true, CURLOPT_MAXREDIRS => 5); $options = ($custom_options) ? ($std_options + $custom_options) : $std_options; // start the first batch of requests for ($i = 0; $i < $rolling_window; $i++) { $ch = curl_init(); $options[CURLOPT_URL] = $urls[$i]; curl_setopt_array($ch,$options); curl_multi_add_handle($master, $ch); } do { while(($execrun = curl_multi_exec($master, $running)) == CURLM_CALL_MULTI_PERFORM); if($execrun != CURLM_OK) break; // a request was just completed -- find out which one while($done = curl_multi_info_read($master)) { $info = curl_getinfo($done['handle']); if ($info['http_code'] == 200) { $output = curl_multi_getcontent($done['handle']); // request successful. process output using the callback function. $ref = array_search($info['url'],$urls); $callback($output, $activities[$ref],1); // start a new request (it's important to do this before removing the old one) $ch = curl_init(); $options[CURLOPT_URL] = $urls[$i++]; // increment i curl_setopt_array($ch,$options); curl_multi_add_handle($master, $ch); // remove the curl handle that just completed curl_multi_remove_handle($master, $done['handle']); } else { // request failed. add error handling. $dmp = 'Failed!'; var_dump($dmp); } } } while ($running); curl_multi_close($master); return true; } } STANDARD cURL MULTI FUNCTION: public function requestData($urls) { set_time_limit(0); // Create get requests for each URL $mh = curl_multi_init(); foreach($urls as $i => $url) { $ch[$i] = curl_init($url); curl_setopt($ch[$i], CURLOPT_RETURNTRANSFER, 1); curl_multi_add_handle($mh, $ch[$i]); } // Start performing the request do { $execReturnValue = curl_multi_exec($mh, $runningHandles); } while ($execReturnValue == CURLM_CALL_MULTI_PERFORM); // Loop and continue processing the request while ($runningHandles && $execReturnValue == CURLM_OK) { // Wait forever for network $numberReady = curl_multi_select($mh); if ($numberReady != -1) { // Pull in any new data, or at least handle timeouts do { $execReturnValue = curl_multi_exec($mh, $runningHandles); } while ($execReturnValue == CURLM_CALL_MULTI_PERFORM); } } // Check for any errors if ($execReturnValue != CURLM_OK) { trigger_error("Curl multi read error $execReturnValue\n", E_USER_WARNING); } // Extract the content foreach($urls as $i => $url) { // Check for errors $curlError = curl_error($ch[$i]); if($curlError == "") { $res[$i] = curl_multi_getcontent($ch[$i]); } else { return "Curl error on handle $i: $curlError\n"; } // Remove and close the handle curl_multi_remove_handle($mh, $ch[$i]); curl_close($ch[$i]); } // Clean up the curl_multi handle curl_multi_close($mh); // Print the response data return $res; } An assistance would be greatly appreciated!!! I am racking my head against my desk at this point =0). I am open to any suggestions. I will completely scrap the code and take an alternate approach if you would be so kind as to direct me accordingly. FYI - I am running on a hosted, shared server which I have little control over. PHP plugins might not be a route I can take at this point. But if there is something you know of that will assist me, shoot it at me and I will talk with my hosting provider. THANK YOU!!!!