hamboy Posted May 21, 2010 Share Posted May 21, 2010 Hi I am having problems parsing html to strip away code. The original HTML is http://wasted-webspace.net/AL/news.html And I would like it to parse into http://wasted-webspace.net/AL/news_parse.html I notice the unique pattern from starting "<table cellspacing=" to ending "</tr><tr>" But I am getting a "preg_match_all() [function.preg-match-all]: No ending matching delimiter '>' found " error This is what I have for my code <? $target_url = "http://wasted-webspace.net/AL/news.html"; $userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)'; // make the cURL request to $target_url $ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT, $userAgent); curl_setopt($ch, CURLOPT_URL,$target_url); curl_setopt($ch, CURLOPT_FAILONERROR, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER,true); curl_setopt($ch, CURLOPT_TIMEOUT, 10); $html= curl_exec($ch); curl_close($ch); if (!$html) { echo "<br />cURL error number:" .curl_errno($ch); echo "<br />cURL error:" . curl_error($ch); exit; } $pattern = "<table cellspacing(.*)<td><table border="0">"; preg_match_all($pattern, $html, $results, PREG_PATTERN_ORDER); print_r($results); ?> Link to comment https://forums.phpfreaks.com/topic/202538-html-parsing-using-php-regex-help/ Share on other sites More sharing options...
hamboy Posted May 23, 2010 Author Share Posted May 23, 2010 bump! Link to comment https://forums.phpfreaks.com/topic/202538-html-parsing-using-php-regex-help/#findComment-1062143 Share on other sites More sharing options...
Rustywolf Posted May 23, 2010 Share Posted May 23, 2010 http://www.justin-cook.com/wp/2006/03/31/php-parse-a-string-between-two-strings/ Id suggest something like that. Link to comment https://forums.phpfreaks.com/topic/202538-html-parsing-using-php-regex-help/#findComment-1062147 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.