CodeMama Posted April 10, 2009 Share Posted April 10, 2009 trying to clean out all tags except the <br> on some data so I can put it in a database How can I write this: <?php $TESTING = TRUE; $target_url = "http://www.awebsite.com"; $userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)'; $ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT, $userAgent); curl_setopt($ch, CURLOPT_URL,$target_url); curl_setopt($ch, CURLOPT_FAILONERROR, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER,true); curl_setopt($ch, CURLOPT_TIMEOUT, 100); $html = curl_exec($ch); if (!$html) { echo "<br />cURL error number:" .curl_errno($ch); echo "<br />cURL error:" . curl_error($ch); exit; } // parse the html into a DOMDocument $dom = new DOMDocument(); @$dom->loadHTML($html); echo $html; $graphs = split("<p", $html); // Start at 6 to clear out junk at top. Use $i+1 since last paragraph // is footnote that is not needed. for ($i = 6; $i+1 < count($graphs); $i++) { if($TESTING) echo "$i: $graphs[$i]<br />"; //split the paragraphs into lines $graphs->getAttribute('graphs'); $clean = $graphs(\<)(?!br(\s|\/|\>))(.*?\>); $lines = split("<br", $graphs); //for ($i = 1; $i+1 < count($lines); $i++) { // Grab restaurant name if($TESTING) echo "$i: $lines[$i]<br />"; } // Grab address // Grab city // Grab date and visit type // Grab rest of text and store it. Grab numbers of violations? Quote Link to comment Share on other sites More sharing options...
jackpf Posted April 11, 2009 Share Posted April 11, 2009 $string = strip_tags($string, '<br>'); Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.