Jump to content

Scrape Site that Requires Username/Password Login


usefulphp

Recommended Posts

I wrote a script on localhost WAMP, that collects the HTML source (with file_get_contents($url) and regex) from my favorite sites and puts it all together (for my own personal use), so I don't have to manually visit each site to read all that I need to read. 

   However, it does not work if the site has a username/password login form (html form, not Apache "basic authentication").

   What php code can:  1) submit my username and password to the form; 2) make it think the script is Internet Explorer user agent (in case the site requires that); and 3) accept and store any cookies that the site requires to work right?

Link to comment
Share on other sites

  • 8 months later...

cURL lets you login into a site by pushing POST data

 

Do you think you might be willing to explain to a noob how this might be possible?

 

This is the script I am currently using to scrap the data...

 

<?
$url = 'http://anydomain.com/';
$ch = curl_init($url);
//curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);
curl_close($ch);
echo $curl_scraped_page;
?>

 

Wouldn't I need to be able to store the session variables somehow? If so, how would I do this without actually browsing to the site manually?

 

ANy help will do...

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.