Jump to content

was working but suddely dont :(


jamesxg1

Recommended Posts

Hiya guy's,

 

Is there a problem in this script because i cant get it working for the love god it was working fine but now all it does is give me a blank screen and i havent touched it at all,

 

 

exe.php

 

<?php  
  
class tagSpider  {  
  
var $crl;  
var $html;   
var $binary;   
var $url;  
  
  
function tagSpider()  {  

$this->html = "";  
$this->binary = 0;  
$this->url = ""; 

}  
  

function fetchPage($url)  {  
  

$this->url = $url;  
  
if (isset($this->url)) {  
  

$this->ch = curl_init ();  
  
$this->useragent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1";

curl_setopt($this->ch, CURLOPT_RETURNTRANSFER, 1);  
curl_setopt($this->ch, CURLOPT_URL, $this->url);   
curl_setopt($this->ch, CURLOPT_FOLLOWLOCATION, true);    
curl_setopt($this->ch, CURLOPT_BINARYTRANSFER, $this->binary);   
curl_setopt($this->ch, CURLOPT_USERAGENT, $this->useragent);
  

$this->html = curl_exec($this->ch);   
  

curl_close ($this->ch);  
}  
  
}  
  

function parse_array($beg_tag, $close_tag)  {  

preg_match_all("($beg_tag.*$close_tag)siU", $this->html, $matching_data);   
  

return $matching_data[0];  
}  
  
}  
?> 


 

 

run.php

 

<?php

include 'exe.php';

$interval = 10;
$limit = 995;

for ($i = 0; $i <= $limit; $i += $interval) {


$urlrun="http://maps.google.co.uk/maps?f=q&source=s_q&hl=en&geocode=&q=indian+delivery+in+berkshire&sll=52.173511,-0.411301&sspn=0.205075,0.441513&ie=UTF8&ll=52.089633,-0.411301&spn=0.205462,0.441513&z=11&view=text&ei=ir1kSuisJcnKjAfK-Y3cCw&mpnum=9&attrid=&sa=N&start=$i";  

$stag='<span id=title class="fn org" dir=ltr>';
$etag="</span></span>&#8206;</div>"; 
  
$tspider = new tagSpider();  
$tspider->fetchPage($urlrun);  
$linkarray = $tspider->parse_array($stag, $etag); 

foreach ($linkarray as $result) { 

$string = preg_replace('~<div align=left><div class=rescat>Category:.*?<span><a class=f href=/maps?f=q&source=s_q&hl=en&geocode=&q=indian+delivery+in+berkshire&sll=52.173511,-0.411301&sspn=0.205075,0.441513&ie=UTF8&view=text&ei=tvJkSuGYAoKGjAekw8CRDA&attrid=&latlng=16738287191014861748&cd=1&dtab=2&pcsi=16738287191014861748,0 log=miwd id=nrev_A>1 review</a> - <a class=f nw href=https://www.google.com/accounts/ServiceLogin?service=local&hl=en&nui=1&continue=http://maps.google.co.uk/maps%3Ff%3Dq%26source%3Ds_q%26hl%3Den%26geocode%3D%26q%3Dindian%2Bdelivery%2Bin%2Bberkshire%26sll%3D52.173511,-0.411301%26sspn%3D0.205075,0.441513%26ie%3DUTF8%26view%3Dtext%26ei%3DtvJkSuGYAoKGjAekw8CRDA%26attrid%3D%26dtab%3D2%26cid%3D16738287191014861748%26iwd%3D1%26iwloc%3DA%26action%3Dopen log=miwd id=wrev_A>Write a review</a></span></div>~is','',$result);
$stringtop = preg_replace('~</a><span><a class=f href=.*?Write a review</a></span></div>~is',',',$string);
$stringbottom = preg_replace('~</a><span><a class="f nw" href=https://.*?Write a review</a></span></div>~is',',',$stringtop);
$stringmiddle = preg_replace('~</span>&#8206;<div><div>.*?<span class=adr id=adr dir=ltr>~is','',$stringbottom);
$stringfinal = preg_replace('~<div><a href=.*?</span></a>~is','',$stringmiddle);
$bored = preg_replace('~<span .*?>~is','',$stringfinal);
$verybored = preg_replace('~</sp.*?an>~is','',$bored);
$sleep = preg_replace('~</di.*?v>~is','',$verybored);
$close = preg_replace('~&#82.*?06;~is','',$sleep);
$to = preg_replace('~<b.*?>~is','',$close);
$death = preg_replace('~<di.*?v>~is','',$to);
$done = preg_replace('~</b.*?>~is','',$death);

$content = str_replace(" - ", ',', "$done");
$contentt = str_replace(",", '","', "$content");
$contents =  '("' . $contentt . '"),' . "\r\n";
$open = fopen('data.txt', "a+");
$write = fwrite($open, $contents);

echo $contents . "\n";

  }
}
?>

 

Many thanks,

 

James.

Link to comment
Share on other sites

Please ?,

 

This is very needed and i dont understand why it is not working it was but now i just get a blank screen,

 

I spent a lot of time on this believe it or not would be much appreciated if someone could help me here,

 

Many thanks,

 

James.

Link to comment
Share on other sites

Something had to be changed for it to suddenly stop working..

 

Did you take notice of the modified dates of the files to see if they were newer than expected?

 

On my server I have the same issue with a script that will white screen and not show any PHP errors, my problem was that I had added a field to my mySQL query which didn't actually exist in the database therefore the results were not loading.

Link to comment
Share on other sites

Something had to be changed for it to suddenly stop working..

 

Did you take notice of the modified dates of the files to see if they were newer than expected?

 

On my server I have the same issue with a script that will white screen and not show any PHP errors, my problem was that I had added a field to my mySQL query which didn't actually exist in the database therefore the results were not loading.

 

Agreed, But honestly i have not touched them i was running them via my browser refreshed the page and it didnt work from then.

Link to comment
Share on other sites

I know someone mentioned this but put these two lines in both of your files directly after your opening PHP tag:

 

ini_set ("display_errors", "1");
error_reporting(E_ALL);

 

This should have also picked up the single quote error...

Link to comment
Share on other sites

Done, still nothing, i also added

ob_start();

to see if that help's.

 

<?php ob_start();

ini_set ("display_errors", "1");
error_reporting(E_ALL);

include 'exe.php';

$interval = 10;
$limit = 995;

for ($i = 0; $i <= $limit; $i += $interval) {


$urlrun="http://maps.google.co.uk/maps?f=q&source=s_q&hl=en&geocode=&q=indian+delivery+in+berkshire&sll=52.173511,-0.411301&sspn=0.205075,0.441513&ie=UTF8&ll=52.089633,-0.411301&spn=0.205462,0.441513&z=11&view=text&ei=ir1kSuisJcnKjAfK-Y3cCw&mpnum=9&attrid=&sa=N&start=$i";  

$stag='<span id=title class="fn org" dir=ltr>';
$etag="</span></span>&#38;#8206;</div>"; 
  
$tspider = new tagSpider();  
$tspider->fetchPage($urlrun);  
$linkarray = $tspider->parse_array($stag, $etag); 

foreach ($linkarray as $result) { 

$string = preg_replace('~<div align=left><div class=rescat>Category:.*?<span><a class=f href=/maps?f=q&source=s_q&hl=en&geocode=&q=indian+delivery+in+berkshire&sll=52.173511,-0.411301&sspn=0.205075,0.441513&ie=UTF8&view=text&ei=tvJkSuGYAoKGjAekw8CRDA&attrid=&latlng=16738287191014861748&cd=1&dtab=2&pcsi=16738287191014861748,0 log=miwd id=nrev_A>1 review</a> - <a class=f nw href=https://www.google.com/accounts/ServiceLogin?service=local&hl=en&nui=1&continue=http://maps.google.co.uk/maps%3Ff%3Dq%26source%3Ds_q%26hl%3Den%26geocode%3D%26q%3Dindian%2Bdelivery%2Bin%2Bberkshire%26sll%3D52.173511,-0.411301%26sspn%3D0.205075,0.441513%26ie%3DUTF8%26view%3Dtext%26ei%3DtvJkSuGYAoKGjAekw8CRDA%26attrid%3D%26dtab%3D2%26cid%3D16738287191014861748%26iwd%3D1%26iwloc%3DA%26action%3Dopen log=miwd id=wrev_A>Write a review</a></span></div>~is','',$result);
$stringtop = preg_replace('~</a><span><a class=f href=.*?Write a review</a></span></div>~is',',',$string);
$stringbottom = preg_replace('~</a><span><a class="f nw" href=https://.*?Write a review</a></span></div>~is',',',$stringtop);
$stringmiddle = preg_replace('~</span>&#38;#8206;<div><div>.*?<span class=adr id=adr dir=ltr>~is','',$stringbottom);
$stringfinal = preg_replace('~<div><a href=.*?</span></a>~is','',$stringmiddle);
$bored = preg_replace('~<span .*?>~is','',$stringfinal);
$verybored = preg_replace('~</sp.*?an>~is','',$bored);
$sleep = preg_replace('~</di.*?v>~is','',$verybored);
$close = preg_replace('~&#82.*?06;~is','',$sleep);
$to = preg_replace('~<b.*?>~is','',$close);
$death = preg_replace('~<di.*?v>~is','',$to);
$done = preg_replace('~</b.*?>~is','',$death);

$content = str_replace(" - ", ',', "$done");
$contentt = str_replace(",", '","', "$content");
$contents =  '("' . $contentt . '"),' . "\r\n";
$open = fopen('data.txt', "a+");
$write = fwrite($open, $contents);

echo $contents . "\n";

  }
}
?>

Link to comment
Share on other sites

Add this print_r between these two lines:

 

$linkarray = $tspider->parse_array($stag, $etag); 
print_r($linkarray);
foreach ($linkarray as $result) { 

 

print's

Array ( )

,

 

if this is any diffrence when i go to run the page it doesnt stop loading it is just continuous as if the script is massive and it took a while for the Array ( ) to be printed so im guessing some piece of code above that is causing it.

Link to comment
Share on other sites

Morning peepz,

 

Sorry to continue this forum just need some help,

 

Basically i was doing some research last night and because im trying to scrap a google page after a certain amount of time google block's it,

 

Is there anything i can do to get around this block ?,

 

I have changed my user agent to the Google Bot but it still isn't giving me no love,

 

Any ideas would be really appreciated,

 

Many thanks,

 

James.

Link to comment
Share on other sites

I re-made it all and it still wasnt running but luck would have it i know it is connecting because it suddenly popped up with the Google Error Captcha screen but still nothing,

 

Exe.php

 

<?php
  class Spider {

public $place;
public $num;
public $result;
public $url;
public $html = "";
public $matching_data;
public $connect;

  
  function Connect() {

$place = "berkshire";
$interval = 10;
$limit = 995;

for ($i = 0; $i <= $limit; $i += $interval) {

$url = "http://maps.google.co.uk/maps?f=q&source=s_q&hl=en&geocode=&q=indian+takeaways+in+$place&sll=53.800651,-4.064941&sspn=12.657515,28.256836&ie=UTF8&view=text&ei=5-tmSqSDHZDBjAfBu_S5BQ&attrid=&oi=localspell&ct=clnk&cd=1&start=$i"; 

$connect = curl_init("$url");  

curl_setopt($connect, CURLOPT_RETURNTRANSFER, 1);  
curl_setopt($connect, CURLOPT_URL, $url);   
curl_setopt($connect, CURLOPT_FOLLOWLOCATION, true);    
curl_setopt($connect, CURLOPT_BINARYTRANSFER, 0);
curl_setopt($connect, CURLOPT_USERAGENT, 'Googlebot/2.1 (+http://www.google.com/bot.html)'); 
curl_setopt($connect, CURLOPT_REFERER, 'http://www.google.co.uk');
curl_setopt($connect, CURLOPT_SSL_VERIFYHOST, FALSE);
  
ob_start();
$this->html = curl_exec($connect);   

curl_error($connect);

curl_close($connect);  

return ($this->html);
   
  }
}

  function parse_array() {
      
preg_match_all("(<span id=title class=\"fn org\" dir=ltr>.*</span></span>&#38;#8206;</div>)siU", $this->html, $matching_data);

$this->data = $matching_data[0];

return ($this->data);
}

function Info() {

foreach ($this->data as $result) {

$open = fopen('data.txt', "a+");
$string = preg_replace('~<div align=left><div class=rescat>Category:.*?<span><a class=f href=/maps?f=q&source=s_q&hl=en&geocode=&q=indian+delivery+in+berkshire&sll=52.173511,-0.411301&sspn=0.205075,0.441513&ie=UTF8&view=text&ei=tvJkSuGYAoKGjAekw8CRDA&attrid=&latlng=16738287191014861748&cd=1&dtab=2&pcsi=16738287191014861748,0 log=miwd id=nrev_A>1 review</a> - <a class=f nw href=https://www.google.com/accounts/ServiceLogin?service=local&hl=en&nui=1&continue=http://maps.google.co.uk/maps%3Ff%3Dq%26source%3Ds_q%26hl%3Den%26geocode%3D%26q%3Dindian%2Bdelivery%2Bin%2Bberkshire%26sll%3D52.173511,-0.411301%26sspn%3D0.205075,0.441513%26ie%3DUTF8%26view%3Dtext%26ei%3DtvJkSuGYAoKGjAekw8CRDA%26attrid%3D%26dtab%3D2%26cid%3D16738287191014861748%26iwd%3D1%26iwloc%3DA%26action%3Dopen log=miwd id=wrev_A>Write a review</a></span></div>~is','',$result);
$stringtop = preg_replace('~</a><span><a class=f href=.*?Write a review</a></span></div>~is',',',$string);
$stringbottom = preg_replace('~</a><span><a class="f nw" href=https://.*?Write a review</a></span></div>~is',',',$stringtop);
$stringmiddle = preg_replace('~</span>&#38;#8206;<div><div>.*?<span class=adr id=adr dir=ltr>~is','',$stringbottom);
$stringfinal = preg_replace('~<div><a href=.*?</span></a>~is','',$stringmiddle);
$bored = preg_replace('~<span .*?>~is','',$stringfinal);
$verybored = preg_replace('~</sp.*?an>~is','',$bored);
$sleep = preg_replace('~</di.*?v>~is','',$verybored);
$close = preg_replace('~&#82.*?06;~is','',$sleep);
$to = preg_replace('~<b.*?>~is','',$close);
$death = preg_replace('~<di.*?v>~is','',$to);
$done = preg_replace('~</b.*?>~is','',$death);

$content = str_replace(" - ", ',', "$done");
$contentt = str_replace(",", '","', "$content");
$contents =  '("' . $contentt . '"),' . "\r\n";


$write = fwrite($open, $contents);

return $contents;
}
}
  }
?>

 

Test.php

 

<?php

include 'exe.php';

$spider = new Spider(); 

$connect = $spider->Connect($place);  

$linkarray = $spider->parse_array(); 

foreach ($linkarray as $result) { 

$contents = $spider->Info($result);

echo $contents . "\n";

  }

?>

 

i used

exit(print_r($VAR));

on the $connect var and the $linkarray var in the and the $connect var returns the number ' 1 ' and when i use it on the $linkarray it returns ' Array ( ) 1 '

 

any suggestions ?,

 

james.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.