kevinkhan Posted November 10, 2009 Share Posted November 10, 2009 i want to extract link and link title from this html format.. <div class="sresult_address"> <h2> 2. <a href="http://www.mydomain.eu/property-for-sale/Cape-View-House-Restored-Georgian-House-With-Stunning-Sea-Views-London/485539/"> Cape View House, Restored Georgian House With Stunning Sea Views,London </a> </h2> </div> im looking for a regular expression that will be able to extract these.. This is the one i came up with but doesn't seem to work '~<div class="sresult_address"><h2>\s*[0-9]{1,2}[.]\s*<a href="([^"]*)">(.*?)</a></h2></div>~is'; Is there something wrong with this? Quote Link to comment https://forums.phpfreaks.com/topic/180995-need-help-with-this-regular-expression/ Share on other sites More sharing options...
cags Posted November 10, 2009 Share Posted November 10, 2009 Yes. Between the end of the opening div tag and the beginning of the opening <h2> tag you have nothing in your patter whereas there is quite clearly a lot of whitespace. Same applies to teh closing of nearly all tags. Quote Link to comment https://forums.phpfreaks.com/topic/180995-need-help-with-this-regular-expression/#findComment-955019 Share on other sites More sharing options...
nrg_alpha Posted November 10, 2009 Share Posted November 10, 2009 Regex is not the most suitable tool for tasks like this.. one alternative is to make use of the dom/domxpath: For example: $html = <<<EOF <div class="sresult_address"> <h2> 2. <a href="http://www.mydomain.eu/property-for-sale/Cape-View-House-Restored-Georgian-House-With-Stunning-Sea-Views-London/485539/"> Cape View House, Restored Georgian House With Stunning Sea Views,London </a> </h2> </div> EOF; $dom = new DOMDocument; libxml_use_internal_errors(true); @$dom->loadHTML($html); # change loadHTML to loadHTMLFile and use the complete live site's url within quotes in the parenthesis libxml_use_internal_errors(false); $xpath = new DOMXPath($dom); $aTag = $xpath->query('//div[@class="sresult_address"]/h2/a[@href]'); foreach ($aTag as $val) { echo 'URL: ' . $val->getAttribute('href') . "<br />Link Text: " . $val->nodeValue . "<br />\n"; } Granted, like many other aspects of programming, there's more than one way to skin a cat. Just so happens that while you can use regex for things like this, in my opinion it's akin to using pliers to hammer a nail into the wall (can work but is a tad awkward). Quote Link to comment https://forums.phpfreaks.com/topic/180995-need-help-with-this-regular-expression/#findComment-955021 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.