Jump to content

[SOLVED] Regex URL Validator


ArizonaJohn

Recommended Posts

Hi,

 

The regex URL validator below works fairly well.  It definitely rejects just plain words.  However, it allows sub-domains longer than 63 characters, which is disappointing since I thought the "{1,63}" is blocking that.

 

Any ideas why this is still allowing sub-domains longer than 63 characters?

 

Thanks in advance,

 

John

 


<?php
// Checks if string is a URL
// @param string $url
// @return bool
function isURL($url = NULL) {
        if($url==NULL) return false;

        $protocol = '(http://|https://)';
        $allowed = '([a-z0-9]([-a-z0-9]*[a-z0-9]+)?)';

        $regex = "^". $protocol . // must include the protocol
                         '(' . $allowed . '{1,63}\.)+'. // 1 or several sub domains with a max of 63 chars
                         '[a-z]' . '{2,6}'; // followed by a TLD
        if(eregi($regex, $url)==true) return true;
        else return false;
}

?>

Link to comment
Share on other sites

Can't you just used something like this

<?php
// Checks if string is a URL
// @param string $url
// @return bool
function isURL($url = NULL) {
        if($url==NULL) return false;

        $protocol = '(http://|https://)';
        $allowed = '[-a-z0-9]{1,63}';

        $regex = "^". $protocol . // must include the protocol
                         '(' . $allowed . '\.)'. // 1 or several sub domains with a max of 63 chars
                         '[a-z]' . '{2,6}'; // followed by a TLD
        if(eregi($regex, $url)==true) return true;
        else return false;
}

?>

Link to comment
Share on other sites

@OP

You realize that your code only checks the first part of the provided $url, right? E.g.

 

http://google.dcmfaevvcHUHI#¤¤%"##¤!#¤JKH////DA   SBJMCBNM=!?=)(/&%¤<script>alert('attack');</script>

 

would return valid.

 

If you want to check if $url is a valid URL without the components userinfo, path, filename, query etc., I wrote this:

 

<?php
function is_url($url = false) {
if ($url === false) {
	return false;
}
$pattern =
	'~'. #opening pattern delimiter
	'^'. #start of string
	'[a-z][a-z\d+.-]*://'. #scheme
	'('. #hostname/IP
		'(([a-z\d]([a-z\d-]{0,61}[a-z\d])?\.)+[a-z]{2,6})'. #hostname (1 or more subdomains, length 0-63; TLD, length 2-6)
		'|'. #or
		'(((25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)\.){3}(25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d))'. #IP
	')'.
	'(:(6553[0-5]|655[0-2]\d|65[0-4]\d\d|6[0-4]\d{3}|[1-5]\d{4}|[1-9]\d{0,3}|0))?'. #port (range 0-65535) (optional)
	'/?'. #trailing slash (optional)
	'$'. #end of string
	'~'. #closing pattern delimiter
	'iD'; #pattern modifiers (case-insensitivity, $ end-only)
if (preg_match($pattern, $url, $matches) && (strlen($matches[2]) <= 253)) {
	return true;
} else {
	return false;
}
}
?>

It allows 1 or more valid named subdomains of maximum 63 in length, and the full domain (e.g. mail.google.com) mustn't be longer than 253. It also allows an IP in decimal notation (e.g. 127.0.0.1), and an optional port. I tried to include all components in the check, but couldn't get my head around it, as it gets quite tricky at the end. I might try again later though.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.