Jump to content

How to grab specific html tags from webpage


soma56

Recommended Posts

I found this code which appears to be a more advanced type of 'preg_match'.

 

<?php
function return_between($string, $start, $stop){
$st = $string;
$list = array();
for($i=0;$i<strlen($string);$i++){
	$temp = strpos($st, $start);
	$str = substr($st, $temp+1);
	$split_here = strpos($str, $stop);
	$parsed_string = substr($str, 0, $split_here);
	if($parsed_string == '')
		break;
	$st = substr($str, $split_here+1);
	$list[] = $parsed_string;
}
return $list;
}

$text = 'This is a %string%, get text %some% text...';

print_r(return_between($text, '%', '%'));
?>

 

My question is if I had a page like this:

 

sample.html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Sample Page</title>

<link href="_css/styles.css" rel="stylesheet" type="text/css" />

</head>

<body>

<h1>Hello PHP Freaks!</h1>
<p>I've recived a lot of help here in the past few months and 
I greatly appreciate it. I <td class='s'>Highly Recommend <a href="http://www.phpfreaks.com"</a></td> 
reading the posts in this forum as I have found them to be very 
informative</p>
</body>
</html>

 

And I wanted to grab everything between

 

<td class='s'>

 

and

 

<a href="

 

(This would result with "Highly Recommend")

 

Would I be better off trying to learn 'preg_match' or trying to make it work with the code above?

 

This is the root I'm taking so far to figure this out:

 

<?PHP
//Specifies if parse includes the delineator
define("EXCL", true);

//Get the page contents
$page = file_get_contents('sample.html');

//call to return between
$infoIneed = return_between($page, "<td class=\'s\'>", "(<a href=\"", EXCL);

//separate into a list
$data = explode(" ", "$infoIneed"); 
foreach ($data as $value) 
if (empty($value)){
	var_dump($value);
} else { 
echo $value . "<br />";
}

?>

 

I figured it out. Here's what did it if anyone is having the same problem:

 

if(!function_exists('parse')){
function parse(){

global $text; //text returned

function return_between($string, $start, $stop){
$st = $string;
$list = array();
for($i=0;$i<strlen($string);$i++){
	$temp = strpos($st, $start);
	$str = substr($st, $temp+1);
	$split_here = strpos($str, $stop);
	$parsed_string = substr($str, 0, $split_here);
	if($parsed_string == '')
		break;
	$st = substr($str, $split_here+1);
	$list[] = $parsed_string;
}
return $list;
}

$temp = return_between($page, '<td class=\'s\'>', '<a');

echo "<br />";

$i = 0;
    		foreach($temp as $match) { 
		$i = $i +1;
		echo $match . "<br />". PHP_EOL;
		flush();
        	ob_flush();
        	usleep(50000);
		if ($i == 100) { 
		exit;
		}
		}



}
}
parse();

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.