Jump to content

[SOLVED] PHP and Regular Expression in html page


Jagarm

Recommended Posts

Hello everyone,

 

I am trying to extract some stuff from a page, the following is what i have so far.

 

$content = file_get_contents ( "http://www.dfo-mpo.gc.ca/media/news-presse-eng.htm" );

echo preg_match_all("/<li>(<[^\r]*?)<\/li>/", $content, $maches,PREG_SET_ORDER);
echo "<pre>";
print_r ( $maches );
echo "</pre>";

 

If you view the source on that page you will see I am trying to extract whatever is in <li> and </li> and contains a hyperlink inside <li> and </li>

 

I'm been trying so hard with no luck. I have the regex testbed that I test the regex, it works there but not with php.

 

I would appreciate for your help.

 

Thanks

In cases where looking through tags on a site, one can use DOMDocument / xpath instead.

So if I understand you correctly, you want to only fetch <li> tags with links within them? Perhaps something along the lines of:

 

$dom = new DOMDocument;
@$dom->loadHTMLFile('http://www.dfo-mpo.gc.ca/media/news-presse-eng.htm');
$xpath = new DOMXPath($dom);
$aTag = $xpath->query('//li/a');

foreach ($aTag as $val) {
    echo 'href="' . $val->getAttribute('href') . '" - ' . $val->nodeValue . "<br />\n";
}

 

Output:

href="/media/news-presse-eng.htm" - News Releases
href="/media/charges-inculpations-eng.htm" - Charges and Convictions
href="/media/back-fiche-eng.htm" - Backgrounders
href="/media/statement-declarations-eng.htm" - Ministerial Statements
href="/media/speeches-discours-eng.htm" - Speeches
href="http://www.glf.dfo-mpo.gc.ca/comm/nr-cp/alert-avis-e.php" - E-News 
href="/media/infocus-alaune-eng.htm" - Infocus
href="/media/contacts-eng.htm" - Contacts
.
.
.
etc

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.