isedeasy Posted October 15, 2009 Share Posted October 15, 2009 I did a quick google search and a forum search but could not find anything to help me (may be searching with the wrong words). Basically I want to check if URL that a user inputs exists in my database. the problem I have is that it may exist just in a slightly different format, for example. http://google.com http://www.google.com www.google.com www.google.com/ www.google.com/index.html These are all the same url but if I treat them as stings they are obviously different, How would I go about this? Im guessing I need to format the strings before I check/insert into database? Is there already a function out there that does this? Cheers Link to comment https://forums.phpfreaks.com/topic/177788-compare-two-urls/ Share on other sites More sharing options...
mrMarcus Posted October 15, 2009 Share Posted October 15, 2009 what's common about all of those? google.com you can use either regex or explode() on the vars. the most efficient thing to do would be to normalize the data going into the db so that it is easy to cross-reference, ie. upon inputting the domain name into the db to begin with, strip any .com's, www.'s, http://'s, etc., or any other attributes that you feel like .. this way, you always know what to check against in the db, making life much easier. gotta run, otherwise i would've left some examples .. hopefully somebody else will pick up where i left off. Link to comment https://forums.phpfreaks.com/topic/177788-compare-two-urls/#findComment-937463 Share on other sites More sharing options...
isedeasy Posted October 15, 2009 Author Share Posted October 15, 2009 That makes sense, that's what I thought I would need to do. Cheers Link to comment https://forums.phpfreaks.com/topic/177788-compare-two-urls/#findComment-937466 Share on other sites More sharing options...
isedeasy Posted October 15, 2009 Author Share Posted October 15, 2009 Ok I think I pretty much have it, its not very pretty but it seems to work <?php function formatURL($url) { //an array of strings I wish to remove $remove = array( "http://", "www.", "index.html", "index.php" ); //loop through array foreach($remove as $value) { //replace array values with nothing $url = str_ireplace($value, '', $url); } //if the url ends in a / then trim it off $url = rtrim($url,'/'); return $url; } if (isset($_POST['test'])) { echo '<h1>'.formatURL($_POST['url']).'</h1>'; } ?> <form action="<?php echo $_SERVER['PHP_SELF'] ?>" method="post"> <input type="text" size="80" name="url"/> <input type="submit" name="test"/> </form> Any feedback would be appreciated. Link to comment https://forums.phpfreaks.com/topic/177788-compare-two-urls/#findComment-937486 Share on other sites More sharing options...
Robbrad Posted October 15, 2009 Share Posted October 15, 2009 Not bad but what happens when you get sub domains? eg adwords.google.com do you want to exclude them also? Link to comment https://forums.phpfreaks.com/topic/177788-compare-two-urls/#findComment-937493 Share on other sites More sharing options...
isedeasy Posted October 15, 2009 Author Share Posted October 15, 2009 No not if it points to a different page, basically I don't want somebody to be able to add a link to a page that already exists in my database. I can already spot an issue though just be looking at the url to this page Link to comment https://forums.phpfreaks.com/topic/177788-compare-two-urls/#findComment-937499 Share on other sites More sharing options...
Robbrad Posted October 15, 2009 Share Posted October 15, 2009 Try.. function isURL($url) { $protocol = '(http://|https://)'; $allowed = '([a-z0-9]([-a-z0-9]*[a-z0-9]+)?)'; $regex = "^". $protocol .// must include the protocol '(' . $allowed . '{1,63}\.)+'. // 1 or several sub domains with a max of 63 chars '[a-z]' . '{2,6}'; // followed by a TLD $trimmedURL = preg_replace($regex, "", $url); return $trimmedURL; } $newURL = isURL($url); Link to comment https://forums.phpfreaks.com/topic/177788-compare-two-urls/#findComment-937519 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.