Bodom78 Posted May 18, 2009 Share Posted May 18, 2009 Hey Guys and Gals, What I'm trying to achieve is to pass my HTML content through a plugin to replace all "www." or "http://" and create hrefs from them without harming existing links already created. I have tried several examples from this forum and various other sites but have had little success. The closest I have come is with a ereg_replace example found on the PHP documentation comments. Test case code below to demonstrate the errors I'm having. You can see the output of the script here. <?php $target = ' target="_blank"'; $text = 'www.google.com<br /> http://google.com<br /> http://www/.google.com<br /><br /> Below is a manually created href<br /> <a href="http://www.google.com">Visit Google</a><br /><br /> Below is a URL with variables in the address<br /> http://www.google.com.au/search?hl=en&q=php+freaks&btnG=Google+Search&meta=&aq=f&oq= '; // match protocol://address/path/ $text = ereg_replace("[a-zA-Z]+://([.]?[a-zA-Z0-9_/-])*", "<a href=\"\\0\"$target>\\0</a>", $text); // match www.something $text = ereg_replace("(^| |.)(www([.]?[a-zA-Z0-9_/-])*)", "\\1<a href=\"http://\\2\"$target>\\2</a>", $text); echo $text; ?> Any advice would be appreciated. Cheers. Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted May 18, 2009 Share Posted May 18, 2009 Do you mean something along the lines of: $text = 'www.google.com<br /> http://google.com<br /> http://www/.google.com<br /><br /> Below is a manually created href<br /> <a href="http://www.google.com">Visit Google</a><br /><br /> Below is a URL with variables in the address<br /> http://www.google.com.au/search?hl=en&q=php+freaks&btnG=Google+Search&meta=&aq=f&oq= '; function replaceURL($a){ return (preg_match('#(?:http://\w+\.|www\.).+#i', $a[0], $match))? str_replace($match[0], '<a href="'.$match[0].'">'.$match[0].'</a>', $a[0]) : $a[0]; } $text = preg_replace_callback('#(^|>)[^<]+#', 'replaceURL', $text); echo $text; ? What I have done here is use preg_replace_callback (which looks for anything outside of tags (thus saving any url within an anchor tag for example). Then this gets passed into the function replaceURL, and if the preg_match pattern is found, do the appropriate replacement and return that. I did notice the oddball entry 'http://www/.google.com' so I managed to avoid converting that one as it is not valid by the use of http:\w+ in the pattern (this could be revised if needed). In either case, just note that you should learn PCRE (Perl Compatible Regular Expressions - preg) instead of using ereg, as POSIX (Portable Operating System Interface - ereg) will no longer be included within the core of php as of version 6. You can read up about PCRE here: Phpfreaks regex resources Phpfreaks regex tutorial regular expression tutorials weblogtoolscollection Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted May 18, 2009 Share Posted May 18, 2009 On second though, you can simply use #(?:http://[a-z0-9-]+\.|www\.).+#i as the pattern instead, as domains can can't use an underscore (but can use numbers and hyphens, the latter of which I forgot about). While there are additional restrictions (such as not being able to start (or end) with a hyphen), for all intents and purposes, I am assuming the domain names themselves are in the proper format. I was thinking of a-zA-Z-0-9 when I issued the \w, but forgot to take the underscore into account (as well as missed out on the hyphen). Quote Link to comment Share on other sites More sharing options...
Bodom78 Posted May 19, 2009 Author Share Posted May 19, 2009 Hey there nrg_alpha, Thank you so much for the help and quick response. I have implemented your suggested pattern and it works almost perfectly. The remaining bug is that www's that are converted in a sentence create a href of the remaining words as illustrated here. I am currently reading through the Regex Tutorial on the site and was wondering if the "Quantifier Greediness" section is the one I should be focusing on to sort out the current problem? Cheers Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted May 19, 2009 Share Posted May 19, 2009 Oh, yeah.. the .+ in the pattern is the issue (I was just going off of the example you gave). You can change .+ to [^\s]+ (which is basically anything that is not a space one or more times). In this case, since .+ is the last thing in the pattern, making it lazy (.+?) wouldn't matter, as there is nothing that comes after it for regex to check on. So it would lazily match everything up to a newline. So it's a safer bet to check for say a space (represented by the shorthand class \s - which means 'any whitespace character'). If you run into issues where a url in a string precedes punctuation, you can use rtrim (to get rid of such punctuation marks in the event they get included). Quote Link to comment Share on other sites More sharing options...
Bodom78 Posted May 20, 2009 Author Share Posted May 20, 2009 Thanks again nrg_alpha for the help and great explanation. I did run into another problem but was able to solve it. Basically in a paragraph of text it was only matching and converting the first url it found, but continued fine after a break. I had a look through the PHP docs and found the preg_match_all option and used that which seems to be working fine. I also added the http:// prefix if it wasn't in the url since links are off site and the rtrim() call you suggested to fix ".," after a URL. Here is the version using preg_match_all and rtrim() encase someone else requires something similar. global $target; $target = ' target="_blank"'; $text = '<p>This paragraph contains multiple URLs, Lorem ipsum dolor sit amet, consectetur adipiscing elit. www.google.com, www.maps.google.com and www.yahoo.com. Duis sit amet bibendum lacus. Mauris libero elit, rutrum cursus mattis vel, pharetra a magna.</p>'; function replaceURL($a) { if(preg_match_all('#(?:http://[a-z0-9-]+\.|www\.)[^\s]+#i', $a[0], $match)) { global $target; for($i=0; $i < count($match[0]); $i++) { $prefix = substr($match[0][$i], 7) == 'http://' ? '' : 'http://'; $url = rtrim($match[0][$i], ',\.'); $a[0] = str_replace($url, '<a href="'.$prefix.$url.'" '.$target.'>'.$url.'</a>', $a[0]); } } return $a[0]; } $text = preg_replace_callback('#(^|>)[^<]+#i', 'replaceURL', $text); echo $text; Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.