raimis100 Posted July 12, 2009 Share Posted July 12, 2009 Hello Can someone please help me make regex for scraping all hrefs (links) in the string ? Quote Link to comment https://forums.phpfreaks.com/topic/165702-regex-help/ Share on other sites More sharing options...
wildteen88 Posted July 12, 2009 Share Posted July 12, 2009 First have a search through the posts in the PHP Regex board. There must be 100 if not 1000's of posts for this. Quote Link to comment https://forums.phpfreaks.com/topic/165702-regex-help/#findComment-874098 Share on other sites More sharing options...
.josh Posted July 12, 2009 Share Posted July 12, 2009 http://www.phpfreaks.com/forums/index.php/topic,259638.0.html Quote Link to comment https://forums.phpfreaks.com/topic/165702-regex-help/#findComment-874101 Share on other sites More sharing options...
raimis100 Posted July 12, 2009 Author Share Posted July 12, 2009 This one works, but can someone please make regex which would scrape links and anchors ? Links in one array and anchors in another Quote Link to comment https://forums.phpfreaks.com/topic/165702-regex-help/#findComment-874214 Share on other sites More sharing options...
raimis100 Posted July 13, 2009 Author Share Posted July 13, 2009 Still looking for help on this one Quote Link to comment https://forums.phpfreaks.com/topic/165702-regex-help/#findComment-874503 Share on other sites More sharing options...
.josh Posted July 13, 2009 Share Posted July 13, 2009 First have a search through the posts in the PHP Regex board. There must be 100 if not 1000's of posts for this. Quote Link to comment https://forums.phpfreaks.com/topic/165702-regex-help/#findComment-874505 Share on other sites More sharing options...
raimis100 Posted July 13, 2009 Author Share Posted July 13, 2009 I have searched a ton of threads but no one gives me scraped anchor text Quote Link to comment https://forums.phpfreaks.com/topic/165702-regex-help/#findComment-874526 Share on other sites More sharing options...
nrg_alpha Posted July 13, 2009 Share Posted July 13, 2009 I am sure there are many threads on this as the others have mentioned.. I suppose it boils down to search terms used. Perhaps terms like 'scrape'? But to give you one example: $str = 'This is an <abbr title="silly example">string</abbr> contains a <a href="[url=http://www.somesite.bork/somefile.php]http://www.somesite.bork/somefile.php[/url]"><strong> hyperlink </strong></a> but you can also visit <a href="[url=http://www.whatever.com/somefile2.php]http://www.whatever.com/somefile2.php[/url]">this link</a> as well.'; preg_match_all('#<a[^>]*href=['"]([^'"]+)['"][^>]*>(.+?)</a>#si', $str, $link); $arrTotal = count($link) - 1; for ($a = 0 ; $a < $arrTotal ; $a++) { $href[] = $link[1][$a]; // stores the value of attribute href into array $href $linkText[] = trim(strip_tags($link[2][$a])); // stores hyperlink text into array $linkText } echo '<pre>'.print_r($href, true); // output array $href echo '<pre>'.print_r($linkText, true); // output array $linkText But I prefer using DOM / XPath for parsing tags. Assuming we use $str from the first snippet: $dom = new DOMDocument; $dom->loadHTML($str); // replace $str with string name in question $xpath = new DOMXPath($dom); $aTag = $xpath->query('//a[@href]'); foreach ($aTag as $val) { $href[] = $val->getAttribute('href'); // stores the value of attribute href into array $href $linkText[] = $val->nodeValue; // stores hyperlink text into array $linkText } $linkText = array_map('trim', $linkText); echo '<pre>'.print_r($href, true); echo '<pre>'.print_r($linkText, true); Edit: the posting system is detecting the bogus URLs in the href values and inserting url bbc tags around them.. so you can simply remove those url tags when you cut and paste to test those snippets. Quote Link to comment https://forums.phpfreaks.com/topic/165702-regex-help/#findComment-874585 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.