Jump to content

best way of geting url form user and validate them...


thara

Recommended Posts

Hello...

 

I have a form to get some urls from users. Eg: Web Address, Facebook Address, Twitter Address, Google+ address etc... My problem is how I validate these urls when they submit the form. I tried to validate URL in PHP by using the FILTER_VALIDATE_URL or simply, using regular expression.

 

Here, I would like to know what are the best methods to get such a urls from users. Is it always good to let them to enter protocol? sometimes they may not know it is http, https, ftp, ftps.. etc. I think it is something hard to do some users.

 

I tried something like this using FILTER_VALIDATE_URL, But it always use protocol and sometime I am confusing how its work..

 

 

// validate url

$url = 'http://www.example.com';

 

if (filter_var( $url, FILTER_VALIDATE_URL)){

echo "<br>valid";

} else {

echo "<br>invalid";

}

 

OUTPUT : valid

 

 

// validate url

$url = 'hp://www.example.com';

 

if (filter_var( $url, FILTER_VALIDATE_URL)){

echo "<br>valid";

} else {

echo "<br>invalid";

}

 

OUTPUT : valid

 

 

// validate url
$url = 'http://example.com';

if (filter_var( $url, FILTER_VALIDATE_URL)){
   echo "<br>valid";
} else {
   echo "<br>invalid";
}

[b]OUTPUT[/b] : valid

 

// validate url
$url = 'http://example.com?id=32&name=kamalani';

if (filter_var( $url, FILTER_VALIDATE_URL)){
   echo "<br>valid";
} else {
   echo "<br>invalid";
}

[b]OUTPUT[/b] : valid

 

 

Can you tell me what are the best ways to get urls from user and how those validate?

Any comments are greatly appreciating..

 

Thank you.

Link to comment
Share on other sites

As you noted the VALIDATE_EMAIL filter for PHP isn't quite as good as it should have been, causing some false positives (and false negatives). However, in a recent reply to another thread I've posted an updated version of my own validation function, which does pretty well:

http://forums.phpfreaks.com/topic/273066-how-to-convert-links-to-anchor-tags/#entry1405224

 

I'm sure there are some corner cases which I haven't taken into consideration yet, but for the test cases provided (which covers just about any legit URL I can think of) it works perfectly. ;)

Edited by Christian F.
Link to comment
Share on other sites

When dealing with urls, I found the best way to validate them is to run them through curl, you can follow them to actual urls after redirects, check for response codes that they are able to connect.

 

Here's an example, if you want to turn this into a true/false function to just do checking...that can work fine.

There is some extra values you may not need, but are there in case you want them.

 

<?php
$url = "https://godaddy.com/";

/* connect to the url using curl to see if exists and get the information */
			    //$cookie = tempnam('tmp','cookie');
			    //$cookie_file_path = "tmp/";
			    $ch = curl_init();
			    curl_setopt($ch, CURLOPT_URL, $url);
			    //curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
			    //curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
			    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/6.0 (Windows NT 6.2; WOW64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1');
			    curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
			    curl_setopt($ch, CURLOPT_TIMEOUT, 10);
			    curl_setopt($ch, CURLOPT_MAXREDIRS, 15);
			    curl_setopt($ch, CURLOPT_HEADER, 1);
			    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
			    curl_setopt($ch, CURLOPT_AUTOREFERER, true);
			    curl_setopt($ch, CURLOPT_FILETIME, 1);
			    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
			    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
			    curl_setopt($ch, CURLOPT_ENCODING, "");
			    $curl_session = curl_init();
			    //curl_setopt($curl_session, CURLOPT_COOKIEJAR, $cookie);
			    //curl_setopt($curl_session, CURLOPT_COOKIEFILE, $cookie_file_path);
			    curl_setopt($curl_session, CURLOPT_USERAGENT, 'Mozilla/6.0 (Windows NT 6.2; WOW64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1');
			    curl_setopt($curl_session, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
			    curl_setopt($curl_session, CURLOPT_ENCODING, "");
			    curl_setopt($curl_session, CURLOPT_TIMEOUT, 10);
			    curl_setopt($curl_session, CURLOPT_HEADER, 1);
			    curl_setopt($curl_session, CURLOPT_SSL_VERIFYPEER, FALSE);
			    curl_setopt($curl_session, CURLOPT_HEADER, true);
			    curl_setopt($curl_session, CURLOPT_MAXREDIRS, 15);
			    curl_setopt($curl_session, CURLOPT_RETURNTRANSFER, true);
			    curl_setopt($curl_session, CURLOPT_AUTOREFERER, true);
			    curl_setopt($curl_session, CURLOPT_HTTPGET, true);
			    curl_setopt($curl_session, CURLOPT_URL, $url);
			    $string = mysql_real_escape_string(curl_exec($curl_session));
			    $html = mysql_real_escape_string(curl_exec($ch));

			    $response = curl_getinfo($ch);
			    $valid_url = "Invalid";
			    $valid_array = array(200, 201, 202, 203, 204, 205, 206, 207, 300, 301, 302, 303, 304, 305, 306, 307);

			    if (in_array($response['http_code'], $valid_array)) {
				    $valid_url = "Valid";
				    ini_set("user_agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0");
				    $headers = @get_headers($response['url']);
				    //print_r($headers);
				    if (!$headers) {
					    die("Unable to fetch website.");
				    }
				    $location = "";
				    foreach ($headers as $value) {

					    if (substr(strtolower(trim($value)), 0, 9) == "location:") {

						    if ($value == "Location: /") {
							    $value = $url;
							    return $value;
						    } else {
							    return trim(substr($value, 9, strlen($value)));
						    }
					    }
				    }
			    }

			    if (preg_match("/window\.location\.replace\('(.*)'\)/i", $html, $value) ||
					    preg_match("/window\.location\=[\"'](.*)[\"']/i", $html, $value) ||
					    preg_match("/location\.href\=[\"'](.*)[\"']/i", $html, $value)
			    ) {
				    $finalurl = $value[1];
			    } else {

				    $finalurl = $response['url'];
			    }

//show the final url and response code	   
echo $finalurl."<br />";
echo $response['http_code']." - ".$valid_url."<br />";
?>

Link to comment
Share on other sites

I think what you are asking is not so much a technical/programming problem as much as a best practice approach. In all the addresses you mentioned (Web Address, Facebook Address, Twitter Address, Google+ address) I believe all of them use just http, so https, ftp and others are probably not needed. You need to first make that determination - do you need to support protocols other than http? The answer to that question will be a factor in the appropriate path you take. If you do need to allow for other protocols it does become more complex.

 

If you only need to support http, then you know the user may or may not enter the protocol as part of the address. So, I would first check the submitted value to see if the protocol was entered (look for //) and remove it. Or, you could check if it's not there and add it. The important thing is to make sure that the value you use in your processing is "normalized" to a certain format.

 

If you need to support multiple protocols you could check if one was entered and, if it was, verify that it is a valid protocol. If no protocol was entered then I would assume it is http and use that.

Link to comment
Share on other sites

Thanks for reply Physio.

 

I tried with your suggestion to get a solution. I wrote some code to detect submitted URL has a protocol as you said. Its working little I am expecting but not 100%.

 

[/left]


[left]$url = 'www.example.com';
$checkProtocol = strpos($url, '://');

if (false === $checkProtocol ) {
   $url = 'http://' . $url;    
   echo 'This is new URL : ' . $url;
} else {
   echo 'Invalid';
}[/left]


[left]

Just assume $url variable have a value

htp://www.example.com

its become an invalid url.. not assigning the protocol.

Then I need to check user may or may not have entered the protocol as part of the address. If protocol is not with submitted url I need to add protocol to it. In this case it is

http://. As well as sometimes users may type the protocol incorrectly. Eg. htttp:/, http//, htp:// ect..

 

With this issue I need to know how I remove the protocol completely which users has entered and add new protocol with that URL to insert into the database.

 

Thank You.

Link to comment
Share on other sites

Since my post seems to have been stuck in limbo for a bit, I just would like to point out that I did reply to this thread earlier today.

 

Also, if you really want to be 100% sure that you got a valid and active URL, then you could combine mine and QuickOldCar's solutions. First validate the URL with my function, then use cURL to check if it's active.

Saves you some time for invalid URLs, as you don't have to wait for the HTTP response every time.

Edited by Christian F.
Link to comment
Share on other sites

Since my post seems to have been stuck in limbo for a bit, I just would like to point out that I did reply to this thread earlier today.

 

I hid your post because it specifically referenced validation of email addresses, whereas this post is about URLs. I thought you jumped the gun and misread the request. Instead it seems you may have made a typo

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.