XeroXer Posted February 8, 2007 Share Posted February 8, 2007 Hi there! Made a little script to extract all the links from a site. Works like a charm as it is now but I need it to be able to search a file like http://www.google.com/search?q=xeroxer but then I get an error saying: Warning: file(http://www.google.com/search?q=xeroxer) [function.file]: failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden in /customers/_/_/httpd.www/-/index.php on line 2 <?php $innehall = file('http://www.google.se/'); foreach($innehall as $enrad) { $slicerad = strstr($enrad, 'http://'); $slicerad = strrev($slicerad); $slicerad = strrchr($slicerad, '"'); $slicerad = strrev($slicerad); $slicerad = rtrim($slicerad, '"'); if($slicerad != "") { echo $slicerad; echo "<br>\n"; } else { echo ""; } } ?> So I was wondering if someone had any idea on how to make it work. And if someone has any idea on making the code better (though it works) you are welcome to post to. Quote Link to comment Share on other sites More sharing options...
hvle Posted February 8, 2007 Share Posted February 8, 2007 Your code missed a lot of link. This code I wrote below using DOM and it could access http://www.google.com/search?q=xeroxer with no problem. Provide that you enabled remote file access on your PHP <?php $doc = new DOMDocument(); $doc->loadHTMLFile("http://www.google.com/search?q=xeroxer"); $linkNodes = $doc->getElementsByTagName("a"); for ($i = 0; $i < $linkNodes->length; $i++) echo getLink($linkNodes->item($i)) . "<br>"; function getLink(DOMNode &$dn) { $attrb = $dn->attributes; $link = $attrb->getNamedItem('href'); return $link->nodeValue; } ?> DOM is perfect for writting spider. Quote Link to comment Share on other sites More sharing options...
obsidian Posted February 8, 2007 Share Posted February 8, 2007 Provide that you enabled remote file access on your PHP ... DOM is perfect for writting spider. Therein lies the problem, I believe. If you have remote file access turned on, you shouldn't have to worry about messing with DOM in this case. Beyond that, recommending the use of DOMDocument assumes that PHP5 is installed, which is not yet a safe assumption. I agree that it may be very useful, but there's more information required about server status before recommending an rewrite that drastic Quote Link to comment Share on other sites More sharing options...
hvle Posted February 8, 2007 Share Posted February 8, 2007 You right, I forgot to mention PHP5. PHP5 is not very new anymore, PHP6 is comming. I know most (not all) but most server have PHP5 installed. DOM in PHP 5 is stable, not experimental anymore (manual). It's pretty safe to use, especially for 'spidering' purpose which the simplicity is prefer over stability and reliablity. Quote Link to comment Share on other sites More sharing options...
obsidian Posted February 8, 2007 Share Posted February 8, 2007 PHP5 is not very new anymore, PHP6 is comming. I know most (not all) but most server have PHP5 installed. DOM in PHP 5 is stable, not experimental anymore (manual). It's pretty safe to use, especially for 'spidering' purpose which the simplicity is prefer over stability. True that. I've experimented with the PHP5 DOM, and it's very nice. I just need to find a use for it now ... While PHP5 is not "new" anymore, and PHP6 is coming, there are still a lot of places that require 4.3 compatibility with code... especially if you're looking to distribute your code at all. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.