Jump to content

Need help with this regular expression


kevinkhan

Recommended Posts

i want to extract link and link title from this html format..

 

<div class="sresult_address">
						<h2>
							2.
																	<a href="http://www.mydomain.eu/property-for-sale/Cape-View-House-Restored-Georgian-House-With-Stunning-Sea-Views-London/485539/">
										Cape View House, Restored Georgian House With Stunning Sea Views,London 										</a>
															</h2>
					</div>

 

im looking for a regular expression that will be able to extract these..

 

This is the one i came up with but doesn't seem to work :(

 

'~<div class="sresult_address"><h2>\s*[0-9]{1,2}[.]\s*<a href="([^"]*)">(.*?)</a></h2></div>~is';

 

Is there something wrong with this?

Link to comment
https://forums.phpfreaks.com/topic/180995-need-help-with-this-regular-expression/
Share on other sites

Regex is not the most suitable tool for tasks like this.. one alternative is to make use of the dom/domxpath:

 

For example:

$html = <<<EOF
<div class="sresult_address">
						<h2>
							2.
																	<a href="http://www.mydomain.eu/property-for-sale/Cape-View-House-Restored-Georgian-House-With-Stunning-Sea-Views-London/485539/">
										Cape View House, Restored Georgian House With Stunning Sea Views,London 										</a>
															</h2>
					</div>
EOF;

$dom = new DOMDocument;
libxml_use_internal_errors(true);
@$dom->loadHTML($html); # change loadHTML to loadHTMLFile and use the complete live site's url within quotes in the parenthesis
libxml_use_internal_errors(false);
$xpath = new DOMXPath($dom);
$aTag = $xpath->query('//div[@class="sresult_address"]/h2/a[@href]');

foreach ($aTag as $val) {
    echo 'URL: ' . $val->getAttribute('href') . "<br />Link Text: " . $val->nodeValue . "<br />\n";
}

 

Granted, like many other aspects of programming, there's more than one way to skin a cat. Just so happens that while you can use regex for things like this, in my opinion it's akin to using pliers to hammer a nail into the wall (can work but is a tad awkward).

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.