etrader Posted February 16, 2011 Share Posted February 16, 2011 I successfully load a page by simple_html_dom.php (developed in simplehtmldom.sourceforge.net) as $html = file_get_html('externalpage'); But sometimes this make a high load on CPU and the page does not load for a long time (probably due to the external site server). How can I skip the process when it is not normal to avoid high CPU usage? Quote Link to comment https://forums.phpfreaks.com/topic/227913-high-cpu-load-on-simple_html_dom/ Share on other sites More sharing options...
quasiman Posted February 16, 2011 Share Posted February 16, 2011 download the file first? (untested) $cachepage = "cache/pagename.html"; $external = "http://google.com"; if(!file_exists($cachepage)) { $ch = curl_init($external); $fp = fopen($cachepage,'w'); curl_setopt($ch, CURLOPT_HEADER, false); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_FILE, $fp); curl_exec($ch); curl_close($ch); fclose($fp); } $html = file_get_html($cachepage); unset($cachepage); Quote Link to comment https://forums.phpfreaks.com/topic/227913-high-cpu-load-on-simple_html_dom/#findComment-1175240 Share on other sites More sharing options...
etrader Posted February 16, 2011 Author Share Posted February 16, 2011 This will write "http://google.com" on pagename.html in cache directory? I must create this file to be written ? Quote Link to comment https://forums.phpfreaks.com/topic/227913-high-cpu-load-on-simple_html_dom/#findComment-1175249 Share on other sites More sharing options...
quasiman Posted February 16, 2011 Share Posted February 16, 2011 you would need to create the 'cache' directory, but fopen with 'w' is writing the html file for you. Then at the end, after you're done with the file, unset($cachepage) is deleting the file....to save diskspace or whatever. Quote Link to comment https://forums.phpfreaks.com/topic/227913-high-cpu-load-on-simple_html_dom/#findComment-1175252 Share on other sites More sharing options...
xylex Posted February 16, 2011 Share Posted February 16, 2011 Just throwing it out there, but maybe using one of the half dozen built-in DOM libraries in PHP instead of one thrown together with a bunch of regular expressions and recursive calls would improve performance? Quote Link to comment https://forums.phpfreaks.com/topic/227913-high-cpu-load-on-simple_html_dom/#findComment-1175253 Share on other sites More sharing options...
etrader Posted February 16, 2011 Author Share Posted February 16, 2011 Just throwing it out there, but maybe using one of the half dozen built-in DOM libraries in PHP instead of one thrown together with a bunch of regular expressions and recursive calls would improve performance? What do you mean by that? I did not get it Quote Link to comment https://forums.phpfreaks.com/topic/227913-high-cpu-load-on-simple_html_dom/#findComment-1175322 Share on other sites More sharing options...
xylex Posted February 16, 2011 Share Posted February 16, 2011 PHP already has a number of XML libraries that do all the functionality that it looks like simplehtmldom is trying to replicate, which is probably why that project seems abandoned (last commit was 2008). http://us2.php.net/manual/en/refs.xml.php Not sure what you're trying to do, but DOM and SimpleXML come to mind as good ones to look at. Quote Link to comment https://forums.phpfreaks.com/topic/227913-high-cpu-load-on-simple_html_dom/#findComment-1175335 Share on other sites More sharing options...
etrader Posted February 16, 2011 Author Share Posted February 16, 2011 The interesting point about simple_html_dom is the ability to capture links from a webpage as foreach($html->find('a') as $element) { $string = $element->href; } Is it possible to do so without simple_html_dom? Quote Link to comment https://forums.phpfreaks.com/topic/227913-high-cpu-load-on-simple_html_dom/#findComment-1175339 Share on other sites More sharing options...
xylex Posted February 16, 2011 Share Posted February 16, 2011 http://us2.php.net/manual/en/domdocument.getelementsbytagname.php http://us2.php.net/manual/en/class.domnode.php#domnode.props.attributes Quote Link to comment https://forums.phpfreaks.com/topic/227913-high-cpu-load-on-simple_html_dom/#findComment-1175347 Share on other sites More sharing options...
etrader Posted February 17, 2011 Author Share Posted February 17, 2011 download the file first? (untested) $cachepage = "cache/pagename.html"; $external = "http://google.com"; if(!file_exists($cachepage)) { $ch = curl_init($external); $fp = fopen($cachepage,'w'); curl_setopt($ch, CURLOPT_HEADER, false); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_FILE, $fp); curl_exec($ch); curl_close($ch); fclose($fp); } $html = file_get_html($cachepage); unset($cachepage); The only problem I have is that it writes pagename.html with chmod 644, and then unset cannot delete it or re-writing in the next run. Quote Link to comment https://forums.phpfreaks.com/topic/227913-high-cpu-load-on-simple_html_dom/#findComment-1175546 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.