HTML parsing using PHP Regex Help

hamboy · May 21, 2010

Hi I am having problems parsing html to strip away code. The original HTML is

And I would like it to parse into

http://wasted-webspace.net/AL/news_parse.html

I notice the unique pattern from starting "<table cellspacing=" to ending "</tr><tr>"

But I am getting a "preg_match_all() [function.preg-match-all]: No ending matching delimiter '>' found " error

This is what I have for my code

<?

$target_url = "http://wasted-webspace.net/AL/news.html";
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';

// make the cURL request to $target_url
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html= curl_exec($ch);
curl_close($ch);

if (!$html) {
echo "<br />cURL error number:" .curl_errno($ch);
echo "<br />cURL error:" . curl_error($ch);
exit;
}

$pattern = "<table cellspacing(.*)<td><table border="0">";



preg_match_all($pattern, $html, $results, PREG_PATTERN_ORDER);

print_r($results);


?>

hamboy · May 23, 2010

bump!

Rustywolf · May 23, 2010

http://www.justin-cook.com/wp/2006/03/31/php-parse-a-string-between-two-strings/

Id suggest something like that.

Sign In

HTML parsing using PHP Regex Help

Recommended Posts

hamboy

Link to comment

Share on other sites

hamboy

Link to comment

Share on other sites

Rustywolf

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information