Jump to content

Remote login, source code scraper bot - Curl help needed.


Nuv

Recommended Posts

Hi guys,

 

I am making a bot which only scrapes the source code of the site AFTER logging into the site.The script to login is :

 

<?php

    $username="xxx"; 
$password="iwonttellyou"; 
$url="http://internet.com/login.php"; 
$cookie="cookie.txt"; 

$postdata = "name=".$username."&password=".$password; 

$ch = curl_init(); 
curl_setopt ($ch, CURLOPT_URL, $url); 
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE); 
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"); 
curl_setopt ($ch, CURLOPT_TIMEOUT, 60); 
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 0); 
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie); 
curl_setopt ($ch, CURLOPT_REFERER, $url); 

curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata); 
curl_setopt ($ch, CURLOPT_POST, 1); 
$result = curl_exec ($ch); 

echo $result;  

?>

 

I can see different SESSION ID's in cookie.txt everytime i compile this code, which makes me believe its working.However what next?

How should i go to that site again, already logged in and scrape the data ? Some suggestions would be nice.

Link to comment
Share on other sites

Have you tried doing another curl request using curl_exec() immediately afterwards, using the same $ch ?

 

Yes i have.It doesn't work.

 

If Perl is an option, WWW::Mechanize is more suited for this kind of task.

 

Ill look into it.Never worked with Perl before.

Link to comment
Share on other sites

Have you tried doing another curl request using curl_exec() immediately afterwards, using the same $ch ?

 

Yes i have.It doesn't work.

 

What happens?  And how do you determine the correct request to make after logging in?  Did you find it from the HTML source, from a snooping add-on like LiveHTTPHeaders, or some other method?

 

If Perl is an option, WWW::Mechanize is more suited for this kind of task.

 

Ill look into it.Never worked with Perl before.

 

The awesome thing about WWW::Mechanize is it will not only keep track of your cookies, it will also parse the html and let you select links by name or link text, and let you choose and submit a form without requiring you to parse it.  People have tried to make an equivalent for PHP but there's still no real alternative.  At my workplace we call perl scripts to do this sort of work, then pass the result back to PHP.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.