Bodom78 Posted May 18, 2009 Share Posted May 18, 2009 Hey Guys and Gals, What I'm trying to achieve is to pass my HTML content through a plugin to replace all "www." or "http://" and create hrefs from them without harming existing links already created. I have tried several examples from this forum and various other sites but have had little success. The closest I have come is with a ereg_replace example found on the PHP documentation comments. Test case code below to demonstrate the errors I'm having. You can see the output of the script here. <?php $target = ' target="_blank"'; $text = 'www.google.com<br /> http://google.com<br /> http://www/.google.com<br /><br /> Below is a manually created href<br /> <a href="http://www.google.com">Visit Google</a><br /><br /> Below is a URL with variables in the address<br /> http://www.google.com.au/search?hl=en&q=php+freaks&btnG=Google+Search&meta=&aq=f&oq= '; // match protocol://address/path/ $text = ereg_replace("[a-zA-Z]+://([.]?[a-zA-Z0-9_/-])*", "<a href=\"\\0\"$target>\\0</a>", $text); // match www.something $text = ereg_replace("(^| |.)(www([.]?[a-zA-Z0-9_/-])*)", "\\1<a href=\"http://\\2\"$target>\\2</a>", $text); echo $text; ?> Any advice would be appreciated. Cheers. Link to comment https://forums.phpfreaks.com/topic/158601-solved-finding-www-and-creating-hrefs-within-html-text/ Share on other sites More sharing options...
nrg_alpha Posted May 18, 2009 Share Posted May 18, 2009 Do you mean something along the lines of: $text = 'www.google.com<br /> http://google.com<br /> http://www/.google.com<br /><br /> Below is a manually created href<br /> <a href="http://www.google.com">Visit Google</a><br /><br /> Below is a URL with variables in the address<br /> http://www.google.com.au/search?hl=en&q=php+freaks&btnG=Google+Search&meta=&aq=f&oq= '; function replaceURL($a){ return (preg_match('#(?:http://\w+\.|www\.).+#i', $a[0], $match))? str_replace($match[0], '<a href="'.$match[0].'">'.$match[0].'</a>', $a[0]) : $a[0]; } $text = preg_replace_callback('#(^|>)[^<]+#', 'replaceURL', $text); echo $text; ? What I have done here is use preg_replace_callback (which looks for anything outside of tags (thus saving any url within an anchor tag for example). Then this gets passed into the function replaceURL, and if the preg_match pattern is found, do the appropriate replacement and return that. I did notice the oddball entry 'http://www/.google.com' so I managed to avoid converting that one as it is not valid by the use of http:\w+ in the pattern (this could be revised if needed). In either case, just note that you should learn PCRE (Perl Compatible Regular Expressions - preg) instead of using ereg, as POSIX (Portable Operating System Interface - ereg) will no longer be included within the core of php as of version 6. You can read up about PCRE here: Phpfreaks regex resources Phpfreaks regex tutorial regular expression tutorials weblogtoolscollection Link to comment https://forums.phpfreaks.com/topic/158601-solved-finding-www-and-creating-hrefs-within-html-text/#findComment-836526 Share on other sites More sharing options...
nrg_alpha Posted May 18, 2009 Share Posted May 18, 2009 On second though, you can simply use #(?:http://[a-z0-9-]+\.|www\.).+#i as the pattern instead, as domains can can't use an underscore (but can use numbers and hyphens, the latter of which I forgot about). While there are additional restrictions (such as not being able to start (or end) with a hyphen), for all intents and purposes, I am assuming the domain names themselves are in the proper format. I was thinking of a-zA-Z-0-9 when I issued the \w, but forgot to take the underscore into account (as well as missed out on the hyphen). Link to comment https://forums.phpfreaks.com/topic/158601-solved-finding-www-and-creating-hrefs-within-html-text/#findComment-836613 Share on other sites More sharing options...
Bodom78 Posted May 19, 2009 Author Share Posted May 19, 2009 Hey there nrg_alpha, Thank you so much for the help and quick response. I have implemented your suggested pattern and it works almost perfectly. The remaining bug is that www's that are converted in a sentence create a href of the remaining words as illustrated here. I am currently reading through the Regex Tutorial on the site and was wondering if the "Quantifier Greediness" section is the one I should be focusing on to sort out the current problem? Cheers Link to comment https://forums.phpfreaks.com/topic/158601-solved-finding-www-and-creating-hrefs-within-html-text/#findComment-837255 Share on other sites More sharing options...
nrg_alpha Posted May 19, 2009 Share Posted May 19, 2009 Oh, yeah.. the .+ in the pattern is the issue (I was just going off of the example you gave). You can change .+ to [^\s]+ (which is basically anything that is not a space one or more times). In this case, since .+ is the last thing in the pattern, making it lazy (.+?) wouldn't matter, as there is nothing that comes after it for regex to check on. So it would lazily match everything up to a newline. So it's a safer bet to check for say a space (represented by the shorthand class \s - which means 'any whitespace character'). If you run into issues where a url in a string precedes punctuation, you can use rtrim (to get rid of such punctuation marks in the event they get included). Link to comment https://forums.phpfreaks.com/topic/158601-solved-finding-www-and-creating-hrefs-within-html-text/#findComment-837440 Share on other sites More sharing options...
Bodom78 Posted May 20, 2009 Author Share Posted May 20, 2009 Thanks again nrg_alpha for the help and great explanation. I did run into another problem but was able to solve it. Basically in a paragraph of text it was only matching and converting the first url it found, but continued fine after a break. I had a look through the PHP docs and found the preg_match_all option and used that which seems to be working fine. I also added the http:// prefix if it wasn't in the url since links are off site and the rtrim() call you suggested to fix ".," after a URL. Here is the version using preg_match_all and rtrim() encase someone else requires something similar. global $target; $target = ' target="_blank"'; $text = '<p>This paragraph contains multiple URLs, Lorem ipsum dolor sit amet, consectetur adipiscing elit. www.google.com, www.maps.google.com and www.yahoo.com. Duis sit amet bibendum lacus. Mauris libero elit, rutrum cursus mattis vel, pharetra a magna.</p>'; function replaceURL($a) { if(preg_match_all('#(?:http://[a-z0-9-]+\.|www\.)[^\s]+#i', $a[0], $match)) { global $target; for($i=0; $i < count($match[0]); $i++) { $prefix = substr($match[0][$i], 7) == 'http://' ? '' : 'http://'; $url = rtrim($match[0][$i], ',\.'); $a[0] = str_replace($url, '<a href="'.$prefix.$url.'" '.$target.'>'.$url.'</a>', $a[0]); } } return $a[0]; } $text = preg_replace_callback('#(^|>)[^<]+#i', 'replaceURL', $text); echo $text; Link to comment https://forums.phpfreaks.com/topic/158601-solved-finding-www-and-creating-hrefs-within-html-text/#findComment-838252 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.