Jump to content

HTML parsing using PHP Regex Help


hamboy

Recommended Posts

Hi I am having problems parsing html to strip away code. The original HTML is

http://wasted-webspace.net/AL/news.html

 

And I would like it to parse into

 

http://wasted-webspace.net/AL/news_parse.html

 

I notice the unique pattern from starting "<table cellspacing=" to ending "</tr><tr>"

 

But I am getting a "preg_match_all() [function.preg-match-all]: No ending matching delimiter '>' found " error

 

This is what I have for my code

<?

$target_url = "http://wasted-webspace.net/AL/news.html";
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';

// make the cURL request to $target_url
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html= curl_exec($ch);
curl_close($ch);

if (!$html) {
echo "<br />cURL error number:" .curl_errno($ch);
echo "<br />cURL error:" . curl_error($ch);
exit;
}

$pattern = "<table cellspacing(.*)<td><table border="0">";



preg_match_all($pattern, $html, $results, PREG_PATTERN_ORDER);

print_r($results);


?>

Link to comment
https://forums.phpfreaks.com/topic/202538-html-parsing-using-php-regex-help/
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.