Jump to content

Trying to use cURL and Simple HTML DOM to auto log into Invision Powerboard


Dustin013

Recommended Posts

I am trying to get this to work. I would like to be able to log into my forum and then access the content withing the forums using cURL, then convert the data into something simple html dom can understand so that I can parse and scrape data. The script below will successfully login to a Invision Power Board I have installed on my testing machine. If I echo out any of the content it shows me the "logging in" page, then forwards me to the domain. So before I get any of the parsing and what have you worked out, I simply need to be able to tell the script how to login, then go to the correct forum so that it can begin collecting data so that the page can be parsed.

 

I really am not sure where I am going wrong but it does log you in then forwards the page to the targeted domain from where ever its run.

 

Any suggestions? I was thinking about possibly using stream_context_create, but I could use some advice.

 

Thanks in advance!

 

<?php

/*
* 
*  The idea of this script is to scrape / parse data from a member protected forum run on Invision Power Board
*  This is not a hack, but rather an information gathering tool to a forum you already have access to.
*  The script uses simple_html_dom and cURL. First the user is logged into the site using cURL, 
*  then directed to the correct forum ID, where data can then be scraped / parsed and organized.
* 
*/

ini_set('display_errors',1); // Turn error reporting on
error_reporting(E_ALL|E_STRICT); // All errors displayed

include_once('simple_html_dom.php'); // Simple_HTML_DOM *http://simplehtmldom.sourceforge.net/)

// Config
//////////////
$url = "http://****.com/ipb/"; // Target URL with a forward slash!
$username = "testing"; // Your forum username
$password = "testing"; // Your forum password
$forum_id = "25"; // The ID of the target forum you want to be logged into
//////////////
// End Config

// Post Data
//////////////
$curlPost = "index.php?act=Login&CODE=01&referer=".urlencode($url)."index.php%3F&UserName=".$username."&PassWord=".$password."&CookieDate=1&showforum=".$forum_id;
echo "curlPost :".$curlPost."<br />";
// Start cURL
//////////////
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // $url is target URL
curl_setopt($ch, CURLOPT_HEADER, 1); // return headers
curl_setopt($ch, CURLOPT_USERAGENT, 'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt'); // Use cookie.txt for STORING cookies
curl_setopt($ch, CURLOPT_POST, true); // Tell curl that we are posting data
curl_setopt($ch, CURLOPT_POSTFIELDS, $curlPost); // send post data
$html = curl_exec($ch); // Execute! $html contains curl data!
curl_close($ch); // Free the memory
//////////////
// End cURL

// Convert $html into $content for use with simple_html_dom
$content = str_get_html($html); // Convert String

// Find all article blocks
foreach($content->find('div.maintitle') as $content) { // so we are looking for anything in the content of a div with an id of maintitle
    $item['title']     = $article->find('div.maintitle', 0)->plaintext; // all this data is to be in plain text
    $content[] = $item; // place into array
}

// Testing Output
////////////////////////////
// You guessed it.. if you each either of these out you get forwarded 
// to the index.php of the target url, however you get logged in
// echo $item;
// echo $content;
// echo '<pre>';
// print_r($content);
// echo '</pre>';
////////////////////////////
// End Testing Output

// Loop through the array to get your data
foreach($content as $item){
// ya ya ya
foreach ($item as $key => $value){
		// output contents of array
	echo $key.' : '.$value.'<br />'; // returns "title : %contents of div%"
}
echo '<br />'; // space em out	
}

$content->clear(); // free up memory
unset($content); // cause you got to
?> 

 

 

Run as is, the script outputs the following before redirecting to the next page

 

curlPost :index.php?act=Login&CODE=01&referer=http%3A%2F%2Fwww.****.com%2Findex.php%3F&UserName=******&PassWord=******&CookieDate=1&showforum=25

 

nodetype : 5

tag : root

attr : Array

children : Array

nodes : Array

parent :

_ : Array

 

0 : HTTP/1.1 200 OK Date: Thu, 04 Jun 2009 02:29:23 GMT Server: Apache Set-Cookie: ipbsession_id=3ea9501cb4aeb7c3aaa5f6044558ac71; path=/; domain=.*****.com; httponly Set-Cookie: ipbipb_stronghold=cdd5a86c4b0dd10032f8ddf41d7443c4; expires=Fri, 04-Jun-2010 02:29:23 GMT; path=/; domain=.*****.com; httponly Set-Cookie: ipbmember_id=******; expires=Fri, 04-Jun-2010 02:29:23 GMT; path=/; domain=******.com; httponly Set-Cookie: ipbpass_hash=92942a37*****257d721b86d6abf6d55; expires=Sat, 04-Jul-2009 02:29:23 GMT; path=/; domain=.*****.com; httponly Set-Cookie: ipbcoppa=0; path=/; domain=.*****.com Set-Cookie: ipbsession_id=2c7ce04ec93d904219a08c31526256d3; path=/; domain=.******.com; httponly Vary: Accept-Encoding Connection: close Transfer-Encoding: chunked Content-Type: text/html

 

Link to comment
Share on other sites

Your parameters are in the format of a GET request not POST. Here is a function so you can use an array for post fields. The key is the field name & the value is the input value. The file index.php should be part of the url not the POST data.

 

<?php
function postString($dataArray) {
foreach($dataArray as $key => $value) {
	if(strlen(trim($value)) > 0) {
		$value = is_array($value) ? $value : urlencode($value);
		$tempString[] = $key . "=" . $value;
	}
	else {
		$tempString[] = $key;
	}
}
$queryString = join('&', $tempString);
return $queryString;
}

// url
$target = "http://****.com/ipb/index.php";
// post data
$postArray['UserName'] 	= "joe";
$postArray['PassWord'] 	= "bloggs";
$postArray['act'] 		= "Login";

curl_setopt($ch, CURLOPT_URL, $target);
curl_setopt($ch, CURLOPT_POSTFIELDS, postString($postArray));
curl_setopt($ch, CURLOPT_POST, TRUE); 
?>

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.