Jump to content

regex help


raimis100

Recommended Posts

I am sure there are many threads on this as the others have mentioned.. I suppose it boils down to search terms used. Perhaps terms like 'scrape'?

 

But to give you one example:

$str = 'This is an <abbr title="silly example">string</abbr> contains a <a href="[url=http://www.somesite.bork/somefile.php]http://www.somesite.bork/somefile.php[/url]"><strong> hyperlink </strong></a> but you can also visit <a href="[url=http://www.whatever.com/somefile2.php]http://www.whatever.com/somefile2.php[/url]">this link</a> as well.';

preg_match_all('#<a[^>]*href=['"]([^'"]+)['"][^>]*>(.+?)</a>#si', $str, $link);
$arrTotal = count($link) - 1;
for ($a = 0 ; $a < $arrTotal ; $a++) {
    $href[] = $link[1][$a]; // stores the value of attribute href into array $href
    $linkText[] = trim(strip_tags($link[2][$a])); // stores hyperlink text into array $linkText
}
echo '<pre>'.print_r($href, true); // output array $href
echo '<pre>'.print_r($linkText, true); // output array $linkText

 

But I prefer using DOM / XPath for parsing tags. Assuming we use $str from the first snippet:

 

$dom = new DOMDocument;
$dom->loadHTML($str); // replace $str with string name in question
$xpath = new DOMXPath($dom);
$aTag = $xpath->query('//a[@href]');

foreach ($aTag as $val) {
    $href[] = $val->getAttribute('href'); // stores the value of attribute href into array $href
    $linkText[] = $val->nodeValue; // stores hyperlink text into array $linkText
}
$linkText = array_map('trim', $linkText);
echo '<pre>'.print_r($href, true);
echo '<pre>'.print_r($linkText, true);

Edit: the posting system is detecting the bogus URLs in the href values and inserting url bbc tags around them.. so you can simply remove those url tags when you cut and paste to test those snippets.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.