Jump to content

Remote login, source code scraper bot - Curl help needed.


Nuv

Recommended Posts

Hi guys,

 

I am making a bot which only scrapes the source code of the site AFTER logging into the site.The script to login is :

 

<?php

    $username="xxx"; 
$password="iwonttellyou"; 
$url="http://internet.com/login.php"; 
$cookie="cookie.txt"; 

$postdata = "name=".$username."&password=".$password; 

$ch = curl_init(); 
curl_setopt ($ch, CURLOPT_URL, $url); 
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE); 
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"); 
curl_setopt ($ch, CURLOPT_TIMEOUT, 60); 
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 0); 
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie); 
curl_setopt ($ch, CURLOPT_REFERER, $url); 

curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata); 
curl_setopt ($ch, CURLOPT_POST, 1); 
$result = curl_exec ($ch); 

echo $result;  

?>

 

I can see different SESSION ID's in cookie.txt everytime i compile this code, which makes me believe its working.However what next?

How should i go to that site again, already logged in and scrape the data ? Some suggestions would be nice.

Link to comment
Share on other sites

Have you tried doing another curl request using curl_exec() immediately afterwards, using the same $ch ?

 

Yes i have.It doesn't work.

 

If Perl is an option, WWW::Mechanize is more suited for this kind of task.

 

Ill look into it.Never worked with Perl before.

Link to comment
Share on other sites

Have you tried doing another curl request using curl_exec() immediately afterwards, using the same $ch ?

 

Yes i have.It doesn't work.

 

What happens?  And how do you determine the correct request to make after logging in?  Did you find it from the HTML source, from a snooping add-on like LiveHTTPHeaders, or some other method?

 

If Perl is an option, WWW::Mechanize is more suited for this kind of task.

 

Ill look into it.Never worked with Perl before.

 

The awesome thing about WWW::Mechanize is it will not only keep track of your cookies, it will also parse the html and let you select links by name or link text, and let you choose and submit a form without requiring you to parse it.  People have tried to make an equivalent for PHP but there's still no real alternative.  At my workplace we call perl scripts to do this sort of work, then pass the result back to PHP.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.