Jump to content

[SOLVED] Finding www. and creating hrefs within HTML text


Recommended Posts

Hey Guys and Gals,

 

What I'm trying to achieve is to pass my HTML content through a plugin to replace all "www." or "http://" and create hrefs from them without harming existing links already created.

 

I have tried several examples from this forum and various other sites but have had little success.

 

The closest I have come is with a ereg_replace example found on the PHP documentation comments.

 

Test case code below to demonstrate the errors I'm having. You can see the output of the script here.

 

<?php
$target = ' target="_blank"';

$text = 'www.google.com<br />
	 http://google.com<br />
	 http://www/.google.com<br /><br />
	 Below is a manually created href<br />
	 <a href="http://www.google.com">Visit Google</a><br /><br />
	 Below is a URL with variables in the address<br />
	 http://www.google.com.au/search?hl=en&q=php+freaks&btnG=Google+Search&meta=&aq=f&oq=
	 ';

// match protocol://address/path/
$text = ereg_replace("[a-zA-Z]+://([.]?[a-zA-Z0-9_/-])*", "<a href=\"\\0\"$target>\\0</a>", $text);

// match www.something
$text = ereg_replace("(^| |.)(www([.]?[a-zA-Z0-9_/-])*)", "\\1<a href=\"http://\\2\"$target>\\2</a>", $text);

echo $text;
?>

 

Any advice would be appreciated.

 

Cheers.

Do you mean something along the lines of:

$text = 'www.google.com<br />
	 http://google.com<br />
	 http://www/.google.com<br /><br />
	 Below is a manually created href<br />
	 <a href="http://www.google.com">Visit Google</a><br /><br />
	 Below is a URL with variables in the address<br />
	 http://www.google.com.au/search?hl=en&q=php+freaks&btnG=Google+Search&meta=&aq=f&oq=
	 ';

function replaceURL($a){
return (preg_match('#(?:http://\w+\.|www\.).+#i', $a[0], $match))? str_replace($match[0], '<a href="'.$match[0].'">'.$match[0].'</a>', $a[0]) : $a[0];
}

$text = preg_replace_callback('#(^|>)[^<]+#', 'replaceURL', $text);
echo $text;

?

 

What I have done here is use preg_replace_callback (which looks for anything outside of tags (thus saving any url within an anchor tag for example).

Then this gets passed into the function replaceURL, and if the preg_match pattern is found, do the appropriate replacement and return that.

I did notice the oddball entry 'http://www/.google.com' so I managed to avoid converting that one as it is not valid by the use of http:\w+ in the pattern (this could be revised if needed).

 

In either case, just note that you should learn PCRE (Perl Compatible Regular Expressions - preg) instead of using ereg, as POSIX (Portable Operating System Interface - ereg) will no longer be included within the core of php as of version 6.

 

You can read up about PCRE here:

Phpfreaks regex resources

Phpfreaks regex tutorial

regular expression tutorials

weblogtoolscollection

 

 

 

On second though, you can simply use #(?:http://[a-z0-9-]+\.|www\.).+#i as the pattern instead, as domains can can't use an underscore (but can use numbers and hyphens, the latter of which I forgot about).

While there are additional restrictions (such as not being able to start (or end) with a hyphen), for all intents and purposes, I am assuming the domain names themselves are in the proper format. I was thinking of a-zA-Z-0-9 when I issued the \w, but forgot to take the underscore into account (as well as missed out on the hyphen).

Hey there nrg_alpha,

 

Thank you so much for the help and quick response.

 

I have implemented your suggested pattern and it works almost perfectly.

 

The remaining bug is that www's that are converted in a sentence create a href of the remaining words as illustrated here.

 

I am currently reading through the Regex Tutorial on the site and was wondering if the "Quantifier Greediness" section is the one I should be focusing on to sort out the current problem?

 

Cheers

Oh, yeah.. the .+ in the pattern is the issue (I was just going off of the example you gave).

You can change .+ to [^\s]+ (which is basically anything that is not a space one or more times).

 

In this case, since .+ is the last thing in the pattern, making it lazy (.+?) wouldn't matter, as there is nothing that comes after it for regex to check on. So it would lazily match everything up to a newline. So it's a safer bet to check for say a space (represented by the shorthand class \s - which means 'any whitespace character'). If you run into issues where a url in a string precedes punctuation, you can use rtrim (to get rid of such punctuation marks in the event they get included).

 

Thanks again nrg_alpha for the help and great explanation.

 

I did run into another problem but was able to solve it. Basically in a paragraph of text it was only matching and converting the first url it found, but continued fine after a break.

 

I had a look through the PHP docs and found the preg_match_all option and used that which seems to be working fine.

 

I also added the http:// prefix if it wasn't in the url since links are off site and the rtrim() call you suggested to fix ".," after a URL.

 

Here is the version using preg_match_all and rtrim() encase someone else requires something similar.

 

global $target;
$target = ' target="_blank"';

$text = '<p>This paragraph contains multiple URLs, Lorem ipsum dolor sit amet, consectetur adipiscing elit. www.google.com, www.maps.google.com and www.yahoo.com. Duis sit amet bibendum lacus. Mauris libero elit, rutrum cursus mattis vel, pharetra a magna.</p>';

function replaceURL($a)
{
if(preg_match_all('#(?:http://[a-z0-9-]+\.|www\.)[^\s]+#i', $a[0], $match))
{
	global $target;
	for($i=0; $i < count($match[0]); $i++)
	{
		$prefix = substr($match[0][$i], 7) == 'http://' ? '' : 'http://';
		$url 	= rtrim($match[0][$i], ',\.');
		$a[0] 	= str_replace($url, '<a href="'.$prefix.$url.'" '.$target.'>'.$url.'</a>', $a[0]);
	}
}		
return $a[0];
}

$text = preg_replace_callback('#(^|>)[^<]+#i', 'replaceURL', $text);
echo $text;

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.