Jump to content

need help making file() open search string


XeroXer

Recommended Posts

Hi there!

Made a little script to extract all the links from a site.

Works like a charm as it is now but I need it to be able to search a file like http://www.google.com/search?q=xeroxer but then I get an error saying:

Warning: file(http://www.google.com/search?q=xeroxer) [function.file]: failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden in /customers/_/_/httpd.www/-/index.php on line 2

<?php
$innehall = file('http://www.google.se/');
foreach($innehall as $enrad)
{
$slicerad = strstr($enrad, 'http://');
$slicerad = strrev($slicerad);
$slicerad = strrchr($slicerad, '"');
$slicerad = strrev($slicerad);
$slicerad = rtrim($slicerad, '"');
if($slicerad != "")
{
	echo $slicerad;
	echo "<br>\n";
}
else
{
	echo "";
}
}
?>

 

So I was wondering if someone had any idea on how to make it work.

And if someone has any idea on making the code better (though it works) you are welcome to post to.

Link to comment
Share on other sites

Your code missed a lot of link.  This code I wrote below using DOM and it could access

http://www.google.com/search?q=xeroxer with no problem.  Provide that you enabled remote file access on your PHP

 

<?php
    $doc = new DOMDocument();
    $doc->loadHTMLFile("http://www.google.com/search?q=xeroxer");

    $linkNodes = $doc->getElementsByTagName("a");
    
    for ($i = 0; $i < $linkNodes->length; $i++)
        echo getLink($linkNodes->item($i)) . "<br>";

    
    
function getLink(DOMNode &$dn)
{
    $attrb = $dn->attributes;
    $link = $attrb->getNamedItem('href');
    return $link->nodeValue;
}
?>

DOM is perfect for writting spider.

Link to comment
Share on other sites

Provide that you enabled remote file access on your PHP

...

DOM is perfect for writting spider.

 

Therein lies the problem, I believe. If you have remote file access turned on, you shouldn't have to worry about messing with DOM in this case. Beyond that, recommending the use of DOMDocument assumes that PHP5 is installed, which is not yet a safe assumption. I agree that it may be very useful, but there's more information required about server status before recommending an rewrite that drastic ;)

Link to comment
Share on other sites

You right, I forgot to mention PHP5.

 

PHP5 is not very new anymore, PHP6 is comming. I know most (not all) but most server have PHP5 installed.

DOM in PHP 5 is stable, not experimental anymore (manual).  It's pretty safe to use, especially for 'spidering' purpose which the simplicity is prefer over stability and reliablity.

Link to comment
Share on other sites

PHP5 is not very new anymore, PHP6 is comming. I know most (not all) but most server have PHP5 installed.

DOM in PHP 5 is stable, not experimental anymore (manual).  It's pretty safe to use, especially for 'spidering' purpose which the simplicity is prefer over stability.

 

True that. I've experimented with the PHP5 DOM, and it's very nice. I just need to find a use for it now ;) ... While PHP5 is not "new" anymore, and PHP6 is coming, there are still a lot of places that require 4.3 compatibility with code... especially if you're looking to distribute your code at all.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.