[SOLVED] help with php and regex

CodeMama · April 10, 2009

trying to clean out all tags except the <br> on some data so I can put it in a database

How can I write this:

<?php


$TESTING = TRUE;


$target_url = "http://www.awebsite.com";

$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';


$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 100);
$html = curl_exec($ch);
if (!$html) {
    echo "<br />cURL error number:" .curl_errno($ch);
    echo "<br />cURL error:" . curl_error($ch);
    exit;
}


// parse the html into a DOMDocument
$dom = new DOMDocument();
@$dom->loadHTML($html);

echo $html;

$graphs = split("<p", $html);

// Start at 6 to clear out junk at top. Use $i+1 since last paragraph
//        is footnote that is not needed.
for ($i = 6; $i+1 < count($graphs); $i++)
{
        
        
        if($TESTING)
                echo "$i: $graphs[$i]<br />"; 
                
    //split the paragraphs into lines
           
           $graphs->getAttribute('graphs');           $clean = $graphs(\<)(?!br(\s|\/|\>))(.*?\>);           $lines = split("<br", $graphs);                
                //for ($i = 1; $i+1 < count($lines); $i++)
                {
    // Grab restaurant name
                if($TESTING)
                echo "$i: $lines[$i]<br />"; 
                }
    // Grab address
    
    // Grab city
    
    
    // Grab date and visit type
    
    
    // Grab rest of text and store it. Grab numbers of violations?

jackpf · April 11, 2009

$string = strip_tags($string, '<br>');

Sign In

[SOLVED] help with php and regex

Recommended Posts

CodeMama

Link to comment

Share on other sites

jackpf

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information