guddoosk Posted January 21, 2010 Share Posted January 21, 2010 I recently joined a firm here. The person who was working at my post has copied a script and put it in the local web server. This script is to get the Googles PR(Page Rank). Now the problem is that this script doesnt seem to work with some sites. Here is the Script - pagerank.php. i have not changed this file. <?php class pagerank { var $url; function pagerank ($url) { set_time_limit(0); $this->url = parse_url('http://' . ereg_replace('^http://', '', $url)); $this->url['full'] = 'http://' . ereg_replace('^http://', '', $url); } function getPage ($url) { if (function_exists('curl_init')) { $ch = curl_init($url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); @curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']); curl_setopt($ch, CURLOPT_REFERER, 'http://www.google.com/search?hl=en&q=google&btnG=Google+Search'); return curl_exec($ch); } else { return file_get_contents($url); } } function getPagerank () { $url = 'info:' . $this->url['host']; $checksum = $this->checksum($this->strord($url)); $url = "http://www.google.com/search?client=navclient-auto&ch=6$checksum&features=Rank&q=$url"; $data = $this->getPage($url); preg_match('#Rank_[0-9]:[0-9][0-9]+){1,}#si', $data, $p); $value = ($p[1]) ? $p[1] : 0; return $value; } function toInt ($string) { return preg_replace('#[^0-9]#si', '', $string); } function to_int_32 (&$x) { $z = hexdec(80000000); $y = (int) $x; if($y ==- $z && $x <- $z){ $y = (int) ((-1) * $x); $y = (-1) * $y; } $x = $y; } function zero_fill ($a, $b) { $z = hexdec(80000000); if ($z & $a) { $a = ($a >> 1); $a &= (~$z); $a |= 0x40000000; $a = ($a >> ($b - 1)); } else { $a = ($a >> $b); } return $a; } function mix ($a, $b, $c) { $a -= $b; $a -= $c; $this->to_int_32($a); $a = (int)($a ^ ($this->zero_fill($c, 13))); $b -= $c; $b -= $a; $this->to_int_32($b); $b = (int)($b ^ ($a << ); $c -= $a; $c -= $b; $this->to_int_32($c); $c = (int)($c ^ ($this->zero_fill($b, 13))); $a -= $b; $a -= $c; $this->to_int_32($a); $a = (int)($a ^ ($this->zero_fill($c, 12))); $b -= $c; $b -= $a; $this->to_int_32($b); $b = (int)($b ^ ($a << 16)); $c -= $a; $c -= $b; $this->to_int_32($c); $c = (int)($c ^ ($this->zero_fill($b, 5))); $a -= $b; $a -= $c; $this->to_int_32($a); $a = (int)($a ^ ($this->zero_fill($c, 3))); $b -= $c; $b -= $a; $this->to_int_32($b); $b = (int)($b ^ ($a << 10)); $c -= $a; $c -= $b; $this->to_int_32($c); $c = (int)($c ^ ($this->zero_fill($b, 15))); return array($a,$b,$c); } function checksum ($url, $length = null, $init = 0xE6359A60) { if (is_null($length)) { $length = sizeof($url); } $a = $b = 0x9E3779B9; $c = $init; $k = 0; $len = $length; while($len >= 12) { $a += ($url[$k + 0] + ($url[$k + 1] << + ($url[$k + 2] << 16) + ($url[$k +3] << 24)); $b += ($url[$k + 4] + ($url[$k + 5] << + ($url[$k + 6] << 16) + ($url[$k +7] << 24)); $c += ($url[$k + 8] + ($url[$k + 9] << + ($url[$k + 10] << 16) + ($url[$k +11] << 24)); $mix = $this->mix($a, $b, $c); $a = $mix[0]; $b = $mix[1]; $c = $mix[2]; $k += 12; $len -= 12; } $c += $length; switch($len) { case 11: $c += ($url[$k + 10] << 24); case 10: $c += ($url[$k + 9] << 16); case 9 : $c += ($url[$k + 8] << ; case 8 : $b += ($url[$k + 7] << 24); case 7 : $b += ($url[$k + 6] << 16); case 6 : $b += ($url[$k + 5] << ; case 5 : $b += ($url[$k + 4]); case 4 : $a += ($url[$k + 3] << 24); case 3 : $a += ($url[$k + 2] << 16); case 2 : $a += ($url[$k + 1] << ; case 1 : $a += ($url[$k + 0]); } $mix = $this->mix($a, $b, $c); return $mix[2]; } function strord ($string) { for($i = 0; $i < strlen($string); $i++) { $result[$i] = ord($string{$i}); } return $result; } } ?> Here is the Front Page - index.php. I have changed some of the original script here to accept multiple urls and also display the results in Table. <?php include("pagerank.php"); $url=""; $pr=""; ?> <html> <head> <title>Get The PR</title> </head> <body> <form method="POST" action="index.php"> type the URLS here:<br /> <textarea name="mytext" rows="10" cols="80"></textarea><br /><?php echo $x ?> <input type="submit" /> </form> <?php $x=$_POST["mytext"]; if (strlen(trim($x))==0) { print("No URLS"); } else { $data = preg_split ("/\r\n/" , $x, -1, PREG_SPLIT_NO_EMPTY); $rno = 1; $pr = ""; echo "<table border='1'>"; foreach ($data as $value) { echo "<tr><td>" . $rno . "</td><td>"; $K=new pagerank($value); $pr=$K->getPagerank(); echo $value . "</td><td>" . $pr . "</td></tr>"; $rno = $rno + 1; $pr=""; } ?> </body> </html> The problem seems to be in getting the PR code from the URLs. Many times it works correct and many times it gives me the following error Warning: file_get_contents(http://www.google.com/search?client=navclient-auto&ch=6-72931209&features=Rank&q=info:www.scamfreeinternet.com) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden in C:\xampp-win32-1.7.3\xampp\htdocs\testPR\pagerank.php on line 22 It also displays the PR of this page as ZERO. I have been trying without success. Here are some websites that generate the error http://www.scamfreeinternet.com/ http://bangbambang.wordpress.com/ http://www.pcterritory.net/ and here are some that dont give me error - http://dancinginamdo.com/ http://freegamezandsoftware.blogspot.com/ The results are correct for them. Please advice me what to do and how to correct this problem so that the script gets the PR of all the sites correctly. I have just joined here and this is my first assignment. Hope you experienced people help me on this one Thanks Link to comment https://forums.phpfreaks.com/topic/189273-help-me/ Share on other sites More sharing options...
teamatomic Posted January 21, 2010 Share Posted January 21, 2010 you are getting a 403,forbidden. look at the line it refers to. That your setting of the user agent. $_SERVER['HTTP_USER_AGENT'] is to get the user agent of whatever is making the request FROM a server. You are backwards. Use a agent of a valid browser. The server is simply forbidding a bot access, which is what it thinks you are, and you are arent you? HTH Teamatomic Link to comment https://forums.phpfreaks.com/topic/189273-help-me/#findComment-999202 Share on other sites More sharing options...
guddoosk Posted January 21, 2010 Author Share Posted January 21, 2010 So can that be corrected anyway? or do I need to re-write the whole code again? In that case I am dead as I have no idea what and how this script does. Seriously speaking... I found that this script was copied from the internet. Now how do the sites that offer this facility do it? I have seen many sites that use similar scripts without any problem. Can anyone refer a better script for this? thanks Link to comment https://forums.phpfreaks.com/topic/189273-help-me/#findComment-999238 Share on other sites More sharing options...
teamatomic Posted January 21, 2010 Share Posted January 21, 2010 Look up a valid user agent. http://www.user-agents.org/ Put it to a var $useragent='Opera/5.0 (Linux 2.0.38 i386; U) [en]'; Put it where it belongs curl_setopt($ch, CURLOPT_USERAGENT, "$useragent"); HTH Teamatomic Link to comment https://forums.phpfreaks.com/topic/189273-help-me/#findComment-999250 Share on other sites More sharing options...
guddoosk Posted January 21, 2010 Author Share Posted January 21, 2010 Thanks. I changed the statements as - $usr_agent='Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTBDFff GTB7.0'; curl_setopt($ch, CURLOPT_USERAGENT, "$usr_agent"); But the same thing happens. Link to comment https://forums.phpfreaks.com/topic/189273-help-me/#findComment-999269 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.