Jump to content

How to retrieve information from other websites and store?


jadoo1989

Recommended Posts

I am working on a search system that will allow me to search user names and profile data from a gaming website. I'm quite new to PHP and I'm learning fairly quickly, however I'm not sure how to realistically go about my task and I hope that one of you might be able to point me in the direction of something I might need read up on. Basically, I'd like my application to authenticate itself to this website and then search for profiles. Once the particular profiles in question are found, it would archive the data or display it to the person searching.

 

I hope I've made myself clear in this and I hope that you guys can help me.

Link to comment
Share on other sites

For someone who's "quite new to PHP", this may not be the easiest thing,

 

cURL: To connect to another site your need to use cURL (or fsocket) this allows you to post values (ie Username and password).

RegEx: To extract the details your probably want to use Regular Expressions,

MySQL: To store the details a simple database ie MySQL would work well as you could do a simple search to check to see if you have the data already.

 

Now, if you are writing this to scrape data from one site it shouldn't be too much trouble (note: check T&C's to check your not breaking the rules), but your need to really look at the planning if you want it to work for multiple sites

Link to comment
Share on other sites

For someone who's "quite new to PHP", this may not be the easiest thing,

 

cURL: To connect to another site your need to use cURL (or fsocket) this allows you to post values (ie Username and password).

RegEx: To extract the details your probably want to use Regular Expressions,

MySQL: To store the details a simple database ie MySQL would work well as you could do a simple search to check to see if you have the data already.

 

Now, if you are writing this to scrape data from one site it shouldn't be too much trouble (note: check T&C's to check your not breaking the rules), but your need to really look at the planning if you want it to work for multiple sites

 

I want it to work for just one site. Thanks for all your help, I'll certainly read through those above mentioned topics. I've already read through the T&C and there's no rule against this. However they do have a click limit or whatever. Several other gaming clans have a system similar to the one I'm wanting to implement, however I'd like to build one from the ground up that I can make my own and customize.

 

It might not be the easiest thing but it's undeniably going to be a great learning experience! :P

Link to comment
Share on other sites

Okay cool, remember we have a RegEx section here (subsection of PHP Help),

as for cURL (you might get away with file_get_contents,

 

here's a very quick and simple example

 

<?php
//Get Data form this page (no login required so no cURL needed)
$data = file_get_contents("http://www.phpfreaks.com/forums/index.php/topic,260169.0.html");

//$data is the same as viewing source, if you view this source your see "action=profile;u=47001" and "action=profile;u=84706" these are the links to our profiles
//Find text in between action=profile;u={any number}'> get data until I see a <
preg_match_all('/action=profile;u=\d+\'>([^<]*)</', $data, $Users);
$Users = $Users[1];

//display or insert found data
foreach($Users as $User)
{
//INSERT INTO DATABASE
echo "$User\<br />\n";
}
?>

 

EDIT: added some comments

Link to comment
Share on other sites

Yeah, your need to use cURL then,

 

Here a function I wrote, for a basic login (was only for testing, but it works, I made a few tweak to run as a single function but it something to play with)

 

<?php
function Login()
{
	$ch = curl_init('http://domain.com/users.php?act=login-d');
	$ckfile = sys_get_temp_dir()."/CURLCOOKIE.txt";
	curl_setopt ($ch, CURLOPT_COOKIEJAR, $ckfile);
	curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
//Login
	curl_setopt($ch, CURLOPT_POSTFIELDS, array(
		'username'=>"My UserName",
		'password'=>'My Password',
		'login'=>'Login',
		'remember'=>'0'
		)
	);
	curl_setopt ($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)');
	curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
	$output = curl_exec ($ch);
	return $output;
}
?>

 

Login() will return the HTML found,

 

you need to update the

URL

$ch = curl_init('http://domain.com/users.php?act=login-d');

 

and

Fields

curl_setopt($ch, CURLOPT_POSTFIELDS, array(
		'username'=>"My UserName",
		'password'=>'My Password',
		'login'=>'Login',
		'remember'=>'0'
		)
	);

 

Just view source and read the form from the HTML and build the array above to suite

 

Okay I think you have everything needed..

 

So i will wish you luck, any problems just ask :)

 

EDIT: and welcome to PHP Freaks:p

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.