dilbertone Posted December 18, 2010 Share Posted December 18, 2010 Good day dear community. I need to build a function which parses the domain from a url. I have used various ways to parse html sources. But this one is is a bit tricky! See the target i want to parse - it has some invaild Markup: http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=644.0013008534253&SchulAdresseMapDO=194190 well what do you think - can i apply this code here <?php require_once('config.php'); // call config.php for db connection $filename = "url.txt"; // Include the txt file which have urls $each_line = file($filename); foreach($each_line as $line_num => $line) { $line = trim($line); $content = file_get_contents($line); //echo ($content)."<br>"; $pattern = '/<td>(.*?)<\/td>/si'; preg_match_all($pattern,$content,$matches); foreach ($matches[1] as $match) { $match = strip_tags($match); $match = trim($match); //var_dump($match); $sql = mysqli_query("insert into tablename(contents) values ('$match')"); //echo $match; } } ?> well i have to rework the parser-part of this script. I need to parse somway different - since i have other site here. Can anybody help me here to get a better regex - or a better way to parse this site ... Any and all help will be greatly apprecaited. regards db1 Link to comment https://forums.phpfreaks.com/topic/222079-minor-changes-tailoring-of-a-regex-that-allready-works-very-fine/ Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.