Jump to content

minor changes & tailoring of a regex - that allready works very fine


dilbertone

Recommended Posts

 

Good day dear community.

 

I need to build a function which parses the domain from a url. I have used various ways to parse html sources. But this one is is a bit tricky! See the target i want to parse - it has some invaild Markup:

 

http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=644.0013008534253&SchulAdresseMapDO=194190

 

well what do you think - can i apply this code here

 

<?php
require_once('config.php'); // call config.php for db connection
$filename = "url.txt"; // Include the txt file which have urls
$each_line = file($filename);
foreach($each_line as $line_num => $line)
{
    $line = trim($line);
    $content = file_get_contents($line);
    //echo ($content)."<br>";
    $pattern = '/<td>(.*?)<\/td>/si';
    preg_match_all($pattern,$content,$matches);

    foreach ($matches[1] as $match) {
        $match = strip_tags($match);
        $match = trim($match);
        //var_dump($match);
        $sql = mysqli_query("insert into tablename(contents) values ('$match')");
        //echo $match;
    }
}
?>

 

well i have to rework the parser-part of this script. I need to parse somway different - since i have other site here.

 

Can anybody help me  here to get a better regex - or a better way to parse this site ...

 

Any and all help will be greatly apprecaited.

 

regards

db1

 

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.