Jump to content

Scrape Site that Requires Username/Password Login


usefulphp

Recommended Posts

I wrote a script on localhost WAMP, that collects the HTML source (with file_get_contents($url) and regex) from my favorite sites and puts it all together (for my own personal use), so I don't have to manually visit each site to read all that I need to read. 

   However, it does not work if the site has a username/password login form (html form, not Apache "basic authentication").

   What php code can:  1) submit my username and password to the form; 2) make it think the script is Internet Explorer user agent (in case the site requires that); and 3) accept and store any cookies that the site requires to work right?

  • 8 months later...

cURL lets you login into a site by pushing POST data

 

Do you think you might be willing to explain to a noob how this might be possible?

 

This is the script I am currently using to scrap the data...

 

<?
$url = 'http://anydomain.com/';
$ch = curl_init($url);
//curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);
curl_close($ch);
echo $curl_scraped_page;
?>

 

Wouldn't I need to be able to store the session variables somehow? If so, how would I do this without actually browsing to the site manually?

 

ANy help will do...

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.