[SOLVED] Parsing Data From Remote Sites


My subject is that I want to put titles, descriptions, and content from another website and to present them in mine. Can somebody explain to me how to do this? I've read you can do this using fopen and preg_match, but I'm not sure how to implement it.

My subject is that I want to put titles, descriptions, and content from another website and to present them in mine. Can somebody explain to me how to do this? I've read you can do this using fopen and preg_match, but I'm not sure how to implement it.


Depending on the site you can use file_get_contents or, if the site requires posts (i.e. login), you would have to use cURL.


The first option would look something similar to:


$url= "http://www.blah.com";
$conn = fopen($url, "r");
$html = file_get_contents($url);
echo $html; 


@Maq, yeah but the

$conn = fopen($url, "r");



are not required!

unless your using fread(), and only reading a small chunk



tail what data do you wish to extract, can you post some sample of what $html returns and what part you want from it

$url= "http://www.bigfuntown.com/Game-1497.html";
$conn = fopen($url, "r");
$html = file_get_contents($url);
echo $html;


My plan is to enter the website link in a form field and have a button called Auto-fill. When the button is clicked, it gets the website from the form field, and extracts certain info and puts it into my form fields. The info I'm looking to extract is the category, description, name of the game, and the link to the flash file.

I like scraping, so I wrote this for you:


$url = 'http://www.bigfuntown.com/Game-1497.html';
$html = file_get_contents($url);
//parentheses: 1. game title, 2. relative swf link, 3. categories incl. HTML, 4. game description
preg_match('~<title>([^<]*).+(/Games/[^.]+.swf).+Categories:\s*(.*?)<br />.+?GameDescription">([^<]*)~s', $html, $matches);
$game_title = trim($matches[1]);
$game_swf = 'http://www.bigfuntown.com' . $matches[2];
$game_desc = trim($matches[4]);
$game_cat = explode(',', strip_tags($matches[3]));
$game_cat = array_map('trim', $game_cat);
//note that $game_cat is an array of categories

The rest should be the easy part, combining the code with a form. And remember the copyright laws; don't steal.

Worked like a charm. Thank you. Is there any way to select one category instead of all of them? I'm using a drop-down list for the categories and in that sites games, there sometimes is a high score in the category like this, "Categories: High Score, Platform, Shooting, Spark High Scores". Or somehow check their categories vs. a list of mine.

In my original script I created to add games, I would manually extract the data from websites. In this script, I have a drop-down list of categories. I was wondering if I could cross-reference the category extracted from the remote site against the list of categories in my list. Maybe using in_array? I'm not really sure how to go about it.

But when a game has more than one category, then what? I take it you want to use the multiple attribute with the select element. This code will check the returned categories in $game_cat with an array of specified categories, and pre-select the matches in the list:


$game_cat = array('High Score', 'Platform', 'Shooting', 'Spark High Scores');
$categories = array('Shooting', 'Flying', 'RPG', 'Platform', 'Racing');
echo '<select multiple="multiple" name="cat">';
foreach ($categories as $category) {
if (in_array($category, $game_cat)) {
	echo "\n\t<option selected=\"selected\">$category</option>";
} else {
	echo "\n\t<option>$category</option>";
echo "\n</select>";



<select multiple="multiple" name="cat">
<option selected="selected">Shooting</option>
<option selected="selected">Platform</option>

Not at all:

$autofill_url = $_POST['autofill_url'];
if(isset($autofill_url) && $autofill_url >= 0 && is_numeric($autofill_url)) {
$autofill_site = 'http://www.bigfuntown.com/Game-'.$autofill_url.'.html';
$html = file_get_contents($autofill_site);
preg_match('~<title>([^<]*).+(/Games/[^.]+.swf).+Categories:\s*(.*?)<br />.+?GameDescription">([^<]*)~s', $html, $matches);
$game_title = trim($matches[1]);
$game_swf = 'http://www.bigfuntown.com' . $matches[2];
$game_desc = trim($matches[4]);
$game_cat = explode(',', strip_tags($matches[3]));
$game_cat = array_map('trim', $game_cat);
$categories = array('Action','Adventure','Arcade','Bike','Board','Car','Fighting','Multiplayer','Music','Plane','Platform','Puzzle','Other','Racing','Role Playing','Shooting','Simulation','Skill','Sport','Strategy');
foreach ($game_cat as $category) {
	if (in_array($category,$categories)) {
		echo "\n\t<option>$category</option>";

