Jump to content

Need help with this regular expression


kevinkhan

Recommended Posts

i want to extract link and link title from this html format..

 

<div class="sresult_address">
						<h2>
							2.
																	<a href="http://www.mydomain.eu/property-for-sale/Cape-View-House-Restored-Georgian-House-With-Stunning-Sea-Views-London/485539/">
										Cape View House, Restored Georgian House With Stunning Sea Views,London 										</a>
															</h2>
					</div>

 

im looking for a regular expression that will be able to extract these..

 

This is the one i came up with but doesn't seem to work :(

 

'~<div class="sresult_address"><h2>\s*[0-9]{1,2}[.]\s*<a href="([^"]*)">(.*?)</a></h2></div>~is';

 

Is there something wrong with this?

Link to comment
Share on other sites

Yes. Between the end of the opening div tag and the beginning of the opening <h2> tag you have nothing in your patter whereas there is quite clearly a lot of whitespace. Same applies to teh closing of nearly all tags.

Link to comment
Share on other sites

Regex is not the most suitable tool for tasks like this.. one alternative is to make use of the dom/domxpath:

 

For example:

$html = <<<EOF
<div class="sresult_address">
						<h2>
							2.
																	<a href="http://www.mydomain.eu/property-for-sale/Cape-View-House-Restored-Georgian-House-With-Stunning-Sea-Views-London/485539/">
										Cape View House, Restored Georgian House With Stunning Sea Views,London 										</a>
															</h2>
					</div>
EOF;

$dom = new DOMDocument;
libxml_use_internal_errors(true);
@$dom->loadHTML($html); # change loadHTML to loadHTMLFile and use the complete live site's url within quotes in the parenthesis
libxml_use_internal_errors(false);
$xpath = new DOMXPath($dom);
$aTag = $xpath->query('//div[@class="sresult_address"]/h2/a[@href]');

foreach ($aTag as $val) {
    echo 'URL: ' . $val->getAttribute('href') . "<br />Link Text: " . $val->nodeValue . "<br />\n";
}

 

Granted, like many other aspects of programming, there's more than one way to skin a cat. Just so happens that while you can use regex for things like this, in my opinion it's akin to using pliers to hammer a nail into the wall (can work but is a tad awkward).

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.