Jump to content

1 URL RegEx to Rule them all...?


The14thGOD

Recommended Posts

Sorry couldn't resist.

I'm looking to have a preg_replace that matches all the ways to match a url and then replace it with a working link (yep...)

 

Here's what I got so far.

<?php
$row['body'] = preg_replace('/^(https?:\/\/)|(www.)?([a-z0-9\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/','<a href="\\1\\2" target="_blank">\\2</a>',$row['body']);
?>

I'm not sure how on how to make it so that either the http or www part can both be there, one or the other, or neither be there. I'm not sure I even have it written right (probably not)

 

I also am not sure how to write the 2nd part since the http(s)/www is optional.

 

I think it could be like this, but it is kind of long, I'm assuming it could be chopped down a bit?

<?php
   $row['body'] = preg_replace('/^(https?:\/\/|https?:\/\/www.|www.)?([a-z0-9\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/','<a href="\\1\\2" target="_blank">\\2</a>',$row['body']);
?>

 

Any help/improvements is greatly appreciated.

 

Justin

Link to comment
Share on other sites

Aah, the ever returning URL regex ;) I once tried to write my own, but gave up when I got to the last components. Got some parts right though, so have a look if you want:

 

<?php
function is_url($url = false) {
if ($url === false) {
	return false;
}
$filefolderchars = '[a-z0-9+´!"¤%&()=`@£$€^¨\~*\';,.-]';
$pattern =
	'~'. #opening pattern delimiter
	'^'. #start of string
	'[a-z][a-z0-9+.-]*://'. #scheme
	''. #userinfo (optional) (not implemented)
	'(?:'. #hostname/IP
		'((?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+[a-z]{2,6})'. #hostname (1 or more subdomains, length 1-63; TLD, length 2-6)
		'|'. #or
		'(??:(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9]))'. #IP
	')'.
	'(?:?:6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}|[1-5][0-9]{4}|[1-9][0-9]{0,3}|0))?'. #port (range 0-65535) (optional)
	'/?'. #trailing slash (optional)
	'(?:'. #rest is optional
		'(?<=/)'. #must be preceded by a slash
		'(?:' . $filefolderchars . '+/?)*'. #path and filename (optional)
		#'(?:;(?:[^=;]+(?:=[^;]+)?)+)?'. #parameter
	')?'.
	'$'. #end of string
	'~'. #closing pattern delimiter
	'iD'; #pattern modifiers (case-insensitivity, $ end-only)
$test = preg_match($pattern, $url, $matches);
if ($test && (strlen($matches[1]) <= 253)) {
	return true;
} else {
	return false;
}
}
?>

Link to comment
Share on other sites

I don't have the time to help you with your URL regexp, but I can offer a piece of advice.  Whenever I try to right a real nasty regexp I break it apart into pieces.

 

$protocol = '(http|https)';
$domain = '(regexp_to_match_domain)';

$url_regexp = "/{$protocol}{$domain}/"; // then combine them together

 

It will take a bit of time to get it right, but in doing it this way you divide and conquer and can easily test any one of the individual parts.

 

Also, read the specification from W3C on URIs.

Link to comment
Share on other sites

I'd use something like

 

~(https?://)?(www\.)?([a-zA-Z0-9]+?\.)?[a-zA-Z0-9]\.[a-zA-Z]{2,3}(\.[a-zA-Z]{2,3})?(.+)?$~

 

So it makes http, with s, option, and www. optional, then you have the subdomain which is optional, then the main domain with a dot then the .com or .net or whatever, then an optional country code, then match anything that comes after

Link to comment
Share on other sites

Thank you all for your replies. Sorry I havn't had internet for the last couple of days so I was unable to look at these and respond in a reasonable amount of time.

 

Garethp, this looks pretty good (I'm not amazing at RegEx, and I'll have to look up some things again to fully understand it).

 

roopurt18, that's a good idea and a lot easier to read haha.

 

thebadbad, thank you, when I have a chance I might dip deeper into this, though I don't know if I'll need that much URL validation ^_^.

 

Does anyone have any suggestion on how I could put this together as a hyperlink (html). I'd rather avoid the full url as the actual link cause it can look kinda ugly. I'd like it to be something like:

 

URL: http://www.somerandomsite.com/stuff/hi.html

<a href="http://www.somerandomsite.com/stuff/hi.html">somerandomsize.com/stuff/hi.html</a>

 

Slightly better to look at. It would be idea to just fit it into the text (instead of being the url it's shorten to "this site" or something) but I don't think that's possible...I can think of a way but I don't think it would be very user friendly and probably be more work than it's worth. CMS's are fun!

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.