Jump to content

strpos() returning empty


hellonoko

Recommended Posts

I am using strpos() to compare URLS.

 

However in my function it doesn't seem to return anything. When I copy the bit of code out into its own page or outside of my function it works.

 

Any ideas?

 

Code is on line 74.

 

Thanks.

 

<?php

//error_reporting(E_ALL);

//echo $site_url = 'http://www.empreintes-digitales.fr/';
$target_url = "http://www.empreintes-digitales.fr";

//$target_url = 'http://redthreat.wordpress.com/';
//$target_url= 'http://www.kissatlanta.com/blog/';
//$target_url= 'http://www.empreintes-digitales.fr/';

$url = "";
$link = "";

$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';

crawl_page( $target_url, $userAgent);

function crawl_page( $target_url, $userAgent)
{
	$ch = curl_init();

	curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
	curl_setopt($ch, CURLOPT_URL,$target_url);
	curl_setopt($ch, CURLOPT_FAILONERROR, true);
	curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
	curl_setopt($ch, CURLOPT_AUTOREFERER, true);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
	curl_setopt($ch, CURLOPT_TIMEOUT, 10);

	$html = curl_exec($ch);

	if (!$html) 
	{
		echo "<br />cURL error number:" .curl_errno($ch);
		echo "<br />cURL error:" . curl_error($ch);
		exit;
	}

	//
	// load scrapped data into the DOM
	//

	$dom = new DOMDocument();
	@$dom->loadHTML($html);

	//
	// get only LINKS from the DOM with XPath
	//

	$xpath = new DOMXPath($dom);
	$hrefs = $xpath->evaluate("/html/body//a");

	//
	// go through all the links and store to db or whatever
	//
	for ($i = 0; $i < $hrefs->length; $i++) 
	{
		$href = $hrefs->item($i);
		$url = $href->getAttribute('href');

		$links_1[$link] = $url;

		//echo $absolute_links[$link] = relative2absolute($target_url, $url);
		//echo '<br>';

		//if the $url does not contain the web site base address: http://www.thesite.com/ then add it onto the front


		echo gettype($url);
		echo gettype($target_url);

		echo '<b>';
		echo $pos = strpos($url , $target_url);
		echo '</b>';

		if ( $pos == FALSE )
		{
			echo 'INCOMPLETE: '.$url;
			echo '<br>';
		}
		else
		{
			echo 'COMPLETE: '.$url;
			echo '<br>';
		}

	}
}

Link to comment
Share on other sites

if the case returns FALSE, then yes, it returns 'empty'.

<?php
echo "{".strpos('abc', 'a')."}<br>";
echo "{".strpos('abc', 'b')."}<br>";
echo "{".strpos('abc', 'c')."}<br>";
echo "{".strpos('abc', 'd')."}<br>";
?>

that code outputs

{0}

{1}

{2}

{}

 

I think this explains it pretty well:

http://us2.php.net/manual/en/function.strpos.php

 

Link to comment
Share on other sites

To be more specific:

 

function checkURL($url, $target_url)
{
	echo $url.'<br>';
	echo $target_url.'<br>';

	echo gettype($url).'<br>';
	echo gettype($target_url).'<br>';

	echo '<b>';
	echo $pos = strpos($url , $target_url);
	echo '</b>';


}

 

Returns:

http://empreintes-digitales.fr/board/register.php
http://www.empreintes-digitales.fr
string
string
http://empreintes-digitales.fr/board/login.php?action=forget
http://www.empreintes-digitales.fr
string
string
#
http://www.empreintes-digitales.fr
string
string
http://66.102.9.104/translate_c?hl=fr&sl=fr&tl=en&u=www.empreintes-digitales.fr/index.php
http://www.empreintes-digitales.fr
string
string

 

And on and on. Nothing from

$pos = strpos()

 

 

Link to comment
Share on other sites

actually it is comparing correctly.

 

"http://empreintes-digitales.fr/board/register.php"

simply does not contain the string:

"http://www.empreintes-digitales.fr"

 

perhaps you want to shorten your target_url a bit?

I don't know what the purpose of this is, but you could strip the target url down to 1st and 2nd level domains and get far better matches.

since

"http://empreintes-digitales.fr/board/register.php"

does contain

"empreintes-digitales.fr"

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.