Jump to content

[SOLVED] Regex URL Validator


ArizonaJohn

Recommended Posts

Hi,

 

The regex URL validator below works fairly well.  It definitely rejects just plain words.  However, it allows sub-domains longer than 63 characters, which is disappointing since I thought the "{1,63}" is blocking that.

 

Any ideas why this is still allowing sub-domains longer than 63 characters?

 

Thanks in advance,

 

John

 


<?php
// Checks if string is a URL
// @param string $url
// @return bool
function isURL($url = NULL) {
        if($url==NULL) return false;

        $protocol = '(http://|https://)';
        $allowed = '([a-z0-9]([-a-z0-9]*[a-z0-9]+)?)';

        $regex = "^". $protocol . // must include the protocol
                         '(' . $allowed . '{1,63}\.)+'. // 1 or several sub domains with a max of 63 chars
                         '[a-z]' . '{2,6}'; // followed by a TLD
        if(eregi($regex, $url)==true) return true;
        else return false;
}

?>

Link to comment
https://forums.phpfreaks.com/topic/162001-solved-regex-url-validator/
Share on other sites

Can't you just used something like this

<?php
// Checks if string is a URL
// @param string $url
// @return bool
function isURL($url = NULL) {
        if($url==NULL) return false;

        $protocol = '(http://|https://)';
        $allowed = '[-a-z0-9]{1,63}';

        $regex = "^". $protocol . // must include the protocol
                         '(' . $allowed . '\.)'. // 1 or several sub domains with a max of 63 chars
                         '[a-z]' . '{2,6}'; // followed by a TLD
        if(eregi($regex, $url)==true) return true;
        else return false;
}

?>

@OP

You realize that your code only checks the first part of the provided $url, right? E.g.

 

http://google.dcmfaevvcHUHI#¤¤%"##¤!#¤JKH////DA   SBJMCBNM=!?=)(/&%¤<script>alert('attack');</script>

 

would return valid.

 

If you want to check if $url is a valid URL without the components userinfo, path, filename, query etc., I wrote this:

 

<?php
function is_url($url = false) {
if ($url === false) {
	return false;
}
$pattern =
	'~'. #opening pattern delimiter
	'^'. #start of string
	'[a-z][a-z\d+.-]*://'. #scheme
	'('. #hostname/IP
		'(([a-z\d]([a-z\d-]{0,61}[a-z\d])?\.)+[a-z]{2,6})'. #hostname (1 or more subdomains, length 0-63; TLD, length 2-6)
		'|'. #or
		'(((25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)\.){3}(25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d))'. #IP
	')'.
	'(:(6553[0-5]|655[0-2]\d|65[0-4]\d\d|6[0-4]\d{3}|[1-5]\d{4}|[1-9]\d{0,3}|0))?'. #port (range 0-65535) (optional)
	'/?'. #trailing slash (optional)
	'$'. #end of string
	'~'. #closing pattern delimiter
	'iD'; #pattern modifiers (case-insensitivity, $ end-only)
if (preg_match($pattern, $url, $matches) && (strlen($matches[2]) <= 253)) {
	return true;
} else {
	return false;
}
}
?>

It allows 1 or more valid named subdomains of maximum 63 in length, and the full domain (e.g. mail.google.com) mustn't be longer than 253. It also allows an IP in decimal notation (e.g. 127.0.0.1), and an optional port. I tried to include all components in the check, but couldn't get my head around it, as it gets quite tricky at the end. I might try again later though.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.