ItsPawl Posted February 17, 2011 Share Posted February 17, 2011 Iv'e been looking in to methods of scraping data from pages and has found several examples of using multi-curl to achieve this. But i am not used to curl and is not completely sure how it works and i need to find the fastest reliable (i do need all, or close to all, pages every run) method of getting the content of a number of pages (about 160). Here is an example i got from searching the web which i managed to implement: <?php /** * *@param $picsArr Array [0]=> [url], *@$picsArr Array will filled with the image data , you can use the data as you want or just save it in the next step. **/ function getAllPics(&$picsArr){ if(count($picsArr)<=0) return false; $hArr = array();//handle array foreach($picsArr as $k=>$pic){ $h = curl_init(); curl_setopt($h,CURLOPT_URL,$pic['url']); curl_setopt($h,CURLOPT_HEADER,0); curl_setopt($h,CURLOPT_RETURNTRANSFER,1);//return the image value array_push($hArr,$h); } $mh = curl_multi_init(); foreach($hArr as $k => $h) curl_multi_add_handle($mh,$h); $running = null; do{ curl_multi_exec($mh,$running); }while($running > 0); // get the result and save it in the result ARRAY foreach($hArr as $k => $h){ $picsArr[$k]['data'] = curl_multi_getcontent($h); } //close all the connections foreach($hArr as $k => $h){ $info = curl_getinfo($h); preg_match("/^image\/(.*)$/",$info['content_type'],$matches); echo $tail = $matches[1]; curl_multi_remove_handle($mh,$h); } curl_multi_close($mh); return true; } ?> Since time is critical in my script i would ask if you think this is a good implementation or if you can point me in the direction of one that will save me noticeable run-time. Link to comment https://forums.phpfreaks.com/topic/228012-multi-curl-optimized-for-speed/ Share on other sites More sharing options...
BlueSkyIS Posted February 17, 2011 Share Posted February 17, 2011 i might not understand the complexity of the problem, but is file_get_contents() not an option? if possible, i would loop over each url and use file_get_contents() to get the page content. Link to comment https://forums.phpfreaks.com/topic/228012-multi-curl-optimized-for-speed/#findComment-1175781 Share on other sites More sharing options...
ItsPawl Posted February 17, 2011 Author Share Posted February 17, 2011 Im trying to get all the pages contents in a as short time as possible. As i understand it using multi-curl allows me to get all the pages in parallell instead of one after the other, thus reducing latency wait times. (im pretty sure file_get_contents for each would take longer, unless maybe used with threads somehow) Im only asking if anyone is familiar with these things and know of a still faster way since a low execution time is important in my program. Link to comment https://forums.phpfreaks.com/topic/228012-multi-curl-optimized-for-speed/#findComment-1175849 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.