Jump to content


Photo

Parsing a Remote HTML File


  • Please log in to reply
2 replies to this topic

#1 tonyr1988

tonyr1988
  • New Members
  • Pip
  • Newbie
  • 2 posts

Posted 05 July 2006 - 04:13 AM

I need to be able to grab some information from a remote HTML file.

All I have so far is:

$content = file_get_contents($url);

The problem is, I have no idea what to do. I need contents within the <dd> </dd> tags, preferably in an array. I guess I could keep doing a find on <dd> tags and erase everything before it + 4 (for the tag space), and take it up to the </dd> tag, but it seems really drawn out and confusing....

Can I use ereg to do this? I have never done anything with that at all, so I have no clue.

Can someone please get me started?

#2 indalecio

indalecio
  • New Members
  • Pip
  • Newbie
  • 6 posts

Posted 05 July 2006 - 07:46 AM

I have a better answer .... Databases are your friend ... learn to use them.

#3 mrwhale

mrwhale
  • Members
  • PipPipPip
  • Advanced Member
  • 42 posts

Posted 05 July 2006 - 08:31 AM

I made this for you. :)

Just edit the settings, this code grabs the html and echos it, you have the option of echoing the tags or not aswell ;) Have fun!

In this example it gets all the bold html tags on the page and echos them on a seperate line :) then it removes the tags, leaving just the infor in between the tags

here it is in action: http://www.business-...com/example.php

<?php

$config['url']       = "http://www.business-tycoon.com"; // url of html to grab
$config['start_tag'] = "<b>"; // where you want to start grabbing
$config['end_tag']   = "</b>"; // where you want to stop grabbing
$config['show_tags'] = 0; // do you want the tags to be shown when you show the html? 1 = yes, 0 = no

class grabber
{
	var $error = '';
	var $html  = '';
	
	function grabhtml( $url, $start, $end )
	{
		$file = file_get_contents( $url );
		
		if( $file )
		{
			if( preg_match_all( "#$start(.*?)$end#s", $file, $match ) )
			{				
				$this->html = $match;
			}
			else
			{
				$this->error = "Tags cannot be found.";
			}
		}
		else
		{
			$this->error = "Site cannot be found!";
		}
	}
	
	function strip( $html, $show, $start, $end )
	{
		if( !$show )
		{
			$html = str_replace( $start, "", $html );
			$html = str_replace( $end, "", $html );
			
			return $html;
		}
		else
		{
			return $html;
		}
	}
}

$grab = new grabber;
$grab->grabhtml( $config['url'], $config['start_tag'], $config['end_tag'] );

echo $grab->error;

foreach( $grab->html[0] as $html )
{
	echo htmlspecialchars( $grab->strip( $html, $config['show_tags'], $config['start_tag'], $config['end_tag'] ) ) . "<br>";
}

?>





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users