Jump to content

Detect search bot and bypass cookie.


JayVee

Recommended Posts

I'm trying to allow indexing of pages in my site which require a cookie. (I want the code to detect indexing by searchbots and to bypass the restrictions). Here is my code. It doesn't work.

 

	
if ( strstr($_SERVER['HTTP_USER_AGENT'], "Googlebot" ) == true ){
  //User has the GoogleBot user agent, but is it a real google bot?
  $host = gethostbyaddr($_SERVER['REMOTE_ADDR']);
  if ( substr($host, (strlen($host)-13)) == 'googlebot.com' )
  {
  }
  	//real bot

  else
  	//fake bot or general access to page
if(!isset($_COOKIE['legal'])) {  	
header("Location: /index.php");	 
}

if($_COOKIE['legal'] == "no")
		{
		header("Location: /index.php");	
		}		

 

Even is I get this to work it will only work for Google.

Is there a better way to go about this. I don't need to log the activity of the searchbots. I just want them to gain access to the site.

Link to comment
https://forums.phpfreaks.com/topic/133325-detect-search-bot-and-bypass-cookie/
Share on other sites

<?php
if ( strstr($_SERVER['HTTP_USER_AGENT'], "Googlebot" ) == true ){  // condition (1)
//User has the GoogleBot user agent, but is it a real google bot?
$host = gethostbyaddr($_SERVER['REMOTE_ADDR']);
if ( substr($host, (strlen($host)-13)) == 'googlebot.com' ){  // condition (2)
	//real bot
} 
else if (!isset($_COOKIE['legal'])) {     //condition (3)
	header("Location: /index.php");   
}

//The code below is executed even if conditions (1) and (2) are true.
if($_COOKIE['legal'] == "no"){ //condition (4)
	header("Location: /index.php");   
}

// you're missing a } here

Ok tried that but it bypassed the cookie security on the site. Pages displayed even if cookie didn't exist.

<?php 	

if ( strstr($_SERVER['HTTP_USER_AGENT'], "Googlebot" ) == true ){  // condition (1)
//User has the GoogleBot user agent, but is it a real google bot?
$host = gethostbyaddr($_SERVER['REMOTE_ADDR']);
if ( substr($host, (strlen($host)-13)) == 'googlebot.com' ){  // condition (2)
	//real bot
} 
else if (!isset($_COOKIE['legal'])) {     //condition (3)
	header("Location: /index.php");   
}

//The code below is executed even if conditions (1) and (2) are true.
if($_COOKIE['legal'] == "no"){ //condition (4)
	header("Location: /index.php");   
}

}

?>

 

I want condition 1 and 2 to be checked. If true then I want the page to be viewable without any more condition checks. If false then I want both condition 3 and 4 to be checked. Do I need to put condition 4 into an ELSE IF statement?

 

Basically you need to enlose them in {} after else

<?php
if ( strstr($_SERVER['HTTP_USER_AGENT'], "Googlebot" ) == true ){  // condition (1)
//User has the GoogleBot user agent, but is it a real google bot?
$host = gethostbyaddr($_SERVER['REMOTE_ADDR']);
if ( substr($host, (strlen($host)-13)) == 'googlebot.com' ){  // condition (2)
	//real bot
} 
else {
if (!isset($_COOKIE['legal'])) {     //condition (3)
		header("Location: /index.php");   
	}

	if($_COOKIE['legal'] == "no"){ //condition (4)
		header("Location: /index.php");   
	}
}	
}

 

Ok I rewrote it as this

 
<?php 	

if ( strstr($_SERVER['HTTP_USER_AGENT'], "Googlebot" ) == true ){  // condition (1)
//User has the GoogleBot user agent, but is it a real google bot?
$host = gethostbyaddr($_SERVER['REMOTE_ADDR']);
if ( substr($host, (strlen($host)-13)) == 'googlebot.com' ){  // condition (2)
	//real bot
} 
}
else if (!isset($_COOKIE['legal'])) {     //condition (3)
	header("Location: /index.php");   
}

else 
if($_COOKIE['legal'] == "no"){ //condition (4)
	header("Location: /index.php");   
}


?>

 

It seems to work but google is still failing to index pages.

Any advice?

 

if ( strstr($_SERVER['HTTP_USER_AGENT'], "Googlebot" ) == true ){  // condition (1)

 

That if is techincally wrong. SInce strstr can return 0 it could be a false positive. (I think thats the right word)

 

if ( strstr($_SERVER['HTTP_USER_AGENT'], "Googlebot" ) !== FALSE ){  // condition (1)

 

That should produce the right result for that part. If that is the problem, I do not know. Just saw that it was wrong =).

Ok tried that and pages still failed to index.

Nothing on my site is indexing in google. (I'm using google webmaster tools to submit sitemap and show diagnostics.)

Is it possible to recreate this problem locally with a different value other than googlebot so that I can test my code? That way I could see if my code was working correctly?

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.