thara Posted January 15, 2013 Share Posted January 15, 2013 Hello... I have a form to get some urls from users. Eg: Web Address, Facebook Address, Twitter Address, Google+ address etc... My problem is how I validate these urls when they submit the form. I tried to validate URL in PHP by using the FILTER_VALIDATE_URL or simply, using regular expression. Here, I would like to know what are the best methods to get such a urls from users. Is it always good to let them to enter protocol? sometimes they may not know it is http, https, ftp, ftps.. etc. I think it is something hard to do some users. I tried something like this using FILTER_VALIDATE_URL, But it always use protocol and sometime I am confusing how its work.. // validate url $url = 'http://www.example.com'; if (filter_var( $url, FILTER_VALIDATE_URL)){ echo "<br>valid"; } else { echo "<br>invalid"; } OUTPUT : valid // validate url $url = 'hp://www.example.com'; if (filter_var( $url, FILTER_VALIDATE_URL)){ echo "<br>valid"; } else { echo "<br>invalid"; } OUTPUT : valid // validate url $url = 'http://example.com'; if (filter_var( $url, FILTER_VALIDATE_URL)){ echo "<br>valid"; } else { echo "<br>invalid"; } [b]OUTPUT[/b] : valid // validate url $url = 'http://example.com?id=32&name=kamalani'; if (filter_var( $url, FILTER_VALIDATE_URL)){ echo "<br>valid"; } else { echo "<br>invalid"; } [b]OUTPUT[/b] : valid Can you tell me what are the best ways to get urls from user and how those validate? Any comments are greatly appreciating.. Thank you. Quote Link to comment Share on other sites More sharing options...
Christian F. Posted January 15, 2013 Share Posted January 15, 2013 (edited) As you noted the VALIDATE_EMAIL filter for PHP isn't quite as good as it should have been, causing some false positives (and false negatives). However, in a recent reply to another thread I've posted an updated version of my own validation function, which does pretty well: http://forums.phpfreaks.com/topic/273066-how-to-convert-links-to-anchor-tags/#entry1405224 I'm sure there are some corner cases which I haven't taken into consideration yet, but for the test cases provided (which covers just about any legit URL I can think of) it works perfectly. Edited January 15, 2013 by Christian F. Quote Link to comment Share on other sites More sharing options...
QuickOldCar Posted January 15, 2013 Share Posted January 15, 2013 When dealing with urls, I found the best way to validate them is to run them through curl, you can follow them to actual urls after redirects, check for response codes that they are able to connect. Here's an example, if you want to turn this into a true/false function to just do checking...that can work fine. There is some extra values you may not need, but are there in case you want them. <?php $url = "https://godaddy.com/"; /* connect to the url using curl to see if exists and get the information */ //$cookie = tempnam('tmp','cookie'); //$cookie_file_path = "tmp/"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); //curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie); //curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/6.0 (Windows NT 6.2; WOW64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1'); curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1); curl_setopt($ch, CURLOPT_TIMEOUT, 10); curl_setopt($ch, CURLOPT_MAXREDIRS, 15); curl_setopt($ch, CURLOPT_HEADER, 1); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_FILETIME, 1); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); curl_setopt($ch, CURLOPT_ENCODING, ""); $curl_session = curl_init(); //curl_setopt($curl_session, CURLOPT_COOKIEJAR, $cookie); //curl_setopt($curl_session, CURLOPT_COOKIEFILE, $cookie_file_path); curl_setopt($curl_session, CURLOPT_USERAGENT, 'Mozilla/6.0 (Windows NT 6.2; WOW64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1'); curl_setopt($curl_session, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1); curl_setopt($curl_session, CURLOPT_ENCODING, ""); curl_setopt($curl_session, CURLOPT_TIMEOUT, 10); curl_setopt($curl_session, CURLOPT_HEADER, 1); curl_setopt($curl_session, CURLOPT_SSL_VERIFYPEER, FALSE); curl_setopt($curl_session, CURLOPT_HEADER, true); curl_setopt($curl_session, CURLOPT_MAXREDIRS, 15); curl_setopt($curl_session, CURLOPT_RETURNTRANSFER, true); curl_setopt($curl_session, CURLOPT_AUTOREFERER, true); curl_setopt($curl_session, CURLOPT_HTTPGET, true); curl_setopt($curl_session, CURLOPT_URL, $url); $string = mysql_real_escape_string(curl_exec($curl_session)); $html = mysql_real_escape_string(curl_exec($ch)); $response = curl_getinfo($ch); $valid_url = "Invalid"; $valid_array = array(200, 201, 202, 203, 204, 205, 206, 207, 300, 301, 302, 303, 304, 305, 306, 307); if (in_array($response['http_code'], $valid_array)) { $valid_url = "Valid"; ini_set("user_agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0"); $headers = @get_headers($response['url']); //print_r($headers); if (!$headers) { die("Unable to fetch website."); } $location = ""; foreach ($headers as $value) { if (substr(strtolower(trim($value)), 0, 9) == "location:") { if ($value == "Location: /") { $value = $url; return $value; } else { return trim(substr($value, 9, strlen($value))); } } } } if (preg_match("/window\.location\.replace\('(.*)'\)/i", $html, $value) || preg_match("/window\.location\=[\"'](.*)[\"']/i", $html, $value) || preg_match("/location\.href\=[\"'](.*)[\"']/i", $html, $value) ) { $finalurl = $value[1]; } else { $finalurl = $response['url']; } //show the final url and response code echo $finalurl."<br />"; echo $response['http_code']." - ".$valid_url."<br />"; ?> Quote Link to comment Share on other sites More sharing options...
Psycho Posted January 15, 2013 Share Posted January 15, 2013 I think what you are asking is not so much a technical/programming problem as much as a best practice approach. In all the addresses you mentioned (Web Address, Facebook Address, Twitter Address, Google+ address) I believe all of them use just http, so https, ftp and others are probably not needed. You need to first make that determination - do you need to support protocols other than http? The answer to that question will be a factor in the appropriate path you take. If you do need to allow for other protocols it does become more complex. If you only need to support http, then you know the user may or may not enter the protocol as part of the address. So, I would first check the submitted value to see if the protocol was entered (look for //) and remove it. Or, you could check if it's not there and add it. The important thing is to make sure that the value you use in your processing is "normalized" to a certain format. If you need to support multiple protocols you could check if one was entered and, if it was, verify that it is a valid protocol. If no protocol was entered then I would assume it is http and use that. Quote Link to comment Share on other sites More sharing options...
thara Posted January 15, 2013 Author Share Posted January 15, 2013 Thanks for reply Physio. I tried with your suggestion to get a solution. I wrote some code to detect submitted URL has a protocol as you said. Its working little I am expecting but not 100%. [/left] [left]$url = 'www.example.com'; $checkProtocol = strpos($url, '://'); if (false === $checkProtocol ) { $url = 'http://' . $url; echo 'This is new URL : ' . $url; } else { echo 'Invalid'; }[/left] [left] Just assume $url variable have a value htp://www.example.com its become an invalid url.. not assigning the protocol. Then I need to check user may or may not have entered the protocol as part of the address. If protocol is not with submitted url I need to add protocol to it. In this case it is http://. As well as sometimes users may type the protocol incorrectly. Eg. htttp:/, http//, htp:// ect.. With this issue I need to know how I remove the protocol completely which users has entered and add new protocol with that URL to insert into the database. Thank You. Quote Link to comment Share on other sites More sharing options...
Christian F. Posted January 15, 2013 Share Posted January 15, 2013 (edited) Since my post seems to have been stuck in limbo for a bit, I just would like to point out that I did reply to this thread earlier today. Also, if you really want to be 100% sure that you got a valid and active URL, then you could combine mine and QuickOldCar's solutions. First validate the URL with my function, then use cURL to check if it's active. Saves you some time for invalid URLs, as you don't have to wait for the HTTP response every time. Edited January 15, 2013 by Christian F. Quote Link to comment Share on other sites More sharing options...
Psycho Posted January 15, 2013 Share Posted January 15, 2013 Since my post seems to have been stuck in limbo for a bit, I just would like to point out that I did reply to this thread earlier today. I hid your post because it specifically referenced validation of email addresses, whereas this post is about URLs. I thought you jumped the gun and misread the request. Instead it seems you may have made a typo Quote Link to comment Share on other sites More sharing options...
Christian F. Posted January 15, 2013 Share Posted January 15, 2013 (edited) Ah, yes. Sorry about that. Guess the force (of habit) is strong in me. :\ This is one of those times I really wish I could go back and edit my posts. Edited January 15, 2013 by Christian F. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.