JohnnyDoomo Posted March 14, 2013 Share Posted March 14, 2013 I've searched long and hard and I can't find anything that works. I'm new to php and can't program much, but every tutorial I follow on this doesn't do it how I need. I need something that can take ANY type of looking url and simply return domain.com Every tutorial I look at seems to have problems if a domain is submitted that looks like any of the following: google.com www.google.com https://google.com http://www.google.com http://www.google.co.uk http://www.google.co.uk/blah/blah/blah http://subdomain.google.co.uk/blah/blah/blah http://www.google.com/blah/blah/blah.php?arg=value#anchor Any piece of code I find and test out, it screws up on one or the other. Can anybody please help me with something that is actually somewhat intelligent? I feel like I've only seen tutorials written by programmers that have no idea what I'm trying to get. I'm looking for something that can take a url, no matter how it is written, to process it and take it down to it's most simplest form and make it look like domain.com. I don't know much about php, but I've learned that this parse_url command IMO is shit for what I'm trying to do. Every tutorial that tries to help me with a few lines gets it wrong on one of the above domains. I don't know much about if statements, but I'm at the point I feel like I have to learn that just to write out dozens of statements to remove everything. Please help! Quote Link to comment Share on other sites More sharing options...
KevinM1 Posted March 14, 2013 Share Posted March 14, 2013 Have you looked at parse_url yet? An example: $url = 'https://www.google.com'; $parsed = parse_url($url); echo $parsed['host']; Quote Link to comment Share on other sites More sharing options...
Psycho Posted March 15, 2013 Share Posted March 15, 2013 Have you looked at parse_url yet? That fails on his first two examples. Quote Link to comment Share on other sites More sharing options...
Jessica Posted March 15, 2013 Share Posted March 15, 2013 So check if the first part is http:// and if not add it... Quote Link to comment Share on other sites More sharing options...
Psycho Posted March 15, 2013 Share Posted March 15, 2013 So check if the first part is http:// and if not add it... Yeah, I was just writing some code for that when I realized there is a bigger problem. The 'host' index for parse_url() returns the entire host name. A host name can have multiple subdomains and at least in some instances a host name can have multiple TLDs such as .co.uk. So a URL of 'http://sub1.sub2.google.co.uk' would return 'sub1.sub2.google.co.uk'. How would you programatically know which of those are subdomains? I don'k know if the .uk is the only one that allows for a "sub" TLD, but if so you could code a special case for that and have logic such as: If does not end in UK: - Return everything after second to last dot (if there are at least 2), else return entire string If does end in UK: - Return everything after third to last dot (if there are at least 3), else return entire string Quote Link to comment Share on other sites More sharing options...
Jessica Posted March 15, 2013 Share Posted March 15, 2013 Also it looks like there was a fix in 5.4.7 for the no http cases. Quote Link to comment Share on other sites More sharing options...
Jessica Posted March 15, 2013 Share Posted March 15, 2013 http://www.php.net/manual/en/function.parse-url.php#104874 ?? Maybe? Quote Link to comment Share on other sites More sharing options...
Psycho Posted March 15, 2013 Share Posted March 15, 2013 (edited) Hmm . . . I went ahead and coded around the missing 'http' problem (guess I need to update my PHP install) and I wrote a short script that seems to work the same as that linked script with far fewer lines of code. Not guranteeing it 100% but it worked for all the sample values of the OP and additional testng I did: <?php function returnDomainName($url) { //If does not begin with http, add it if(strtolower(substr($url, 0, 4)) != 'http') { $url = 'http://' . $url; } //Attempt to get components $components = parse_url($url); //If failed, return false if(!$components) { return false; } //Detemine how many parts are needed based on .uk at the end $partCount = (strtolower(strrchr($components['host'], '.')) != '.uk') ? 2 : 3; //Explode based on dots $partsAry = explode('.', $components['host']); //Implode the last $partCount parts back with a dot $domain = implode('.', array_slice($partsAry, -1*$partCount)); return $domain; } //Array of test values $urlList = array( 'google.com', 'www.google.com', 'https://google.com', 'http://www.google.cds', 'http://www.google.co.uk', 'http://www.google.co.uk/blah/blah/blah', 'http://sub1.sub2.google.co.uk:443', 'http://subdomain.google.com/blah/blah/blah', 'http://www.google.com?rg=value#anchor' ); //Test loop foreach($urlList as $url) { echo "URL: $url<br>"; echo "Domain: " . returnDomainName($url); echo "<br><br>"; } ?> Output URL: google.com Domain: google.com URL: www.google.com Domain: google.com URL: https://google.com Domain: google.com URL: http://www.google.cds Domain: google.cds URL: http://www.google.co.uk Domain: google.co.uk URL: http://www.google.co.uk/blah/blah/blah Domain: google.co.uk URL: http://sub1.sub2.google.co.uk:443 Domain: google.co.uk URL: http://subdomain.google.com/blah/blah/blah Domain: google.com URL: http://www.google.com?rg=value#anchor Domain: google.com Edited March 15, 2013 by Psycho Quote Link to comment Share on other sites More sharing options...
kicken Posted March 15, 2013 Share Posted March 15, 2013 I don'k know if the .uk is the only one that allows for a "sub" TLD [/code] It's not, there are a bunch. What it kind of boils down to is how accurate one wants to be with respect to that. There are only a few that are common (in my experience) which you could easily code a few special cases for. A little digging around on wikipedia lead me to a list of possible multi-level domains if you wanted to use it to be more accurate. Quote Link to comment Share on other sites More sharing options...
JohnnyDoomo Posted March 20, 2013 Author Share Posted March 20, 2013 Hmm . . . I went ahead and coded around the missing 'http' problem (guess I need to update my PHP install) and I wrote a short script that seems to work the same as that linked script with far fewer lines of code. Not guranteeing it 100% but it worked for all the sample values of the OP and additional testng I did: <?php function returnDomainName($url) { //If does not begin with http, add it if(strtolower(substr($url, 0, 4)) != 'http') { $url = 'http://' . $url; } //Attempt to get components $components = parse_url($url); //If failed, return false if(!$components) { return false; } //Detemine how many parts are needed based on .uk at the end $partCount = (strtolower(strrchr($components['host'], '.')) != '.uk') ? 2 : 3; //Explode based on dots $partsAry = explode('.', $components['host']); //Implode the last $partCount parts back with a dot $domain = implode('.', array_slice($partsAry, -1*$partCount)); return $domain; } //Array of test values $urlList = array( 'google.com', 'www.google.com', 'https://google.com', 'http://www.google.cds', 'http://www.google.co.uk', 'http://www.google.co.uk/blah/blah/blah', 'http://sub1.sub2.google.co.uk:443', 'http://subdomain.google.com/blah/blah/blah', 'http://www.google.com?rg=value#anchor' ); //Test loop foreach($urlList as $url) { echo "URL: $url<br>"; echo "Domain: " . returnDomainName($url); echo "<br><br>"; } ?> Output URL: google.com Domain: google.com URL: www.google.com Domain: google.com URL: https://google.com Domain: google.com URL: http://www.google.cds Domain: google.cds URL: http://www.google.co.uk Domain: google.co.uk URL: http://www.google.co.uk/blah/blah/blah Domain: google.co.uk URL: http://sub1.sub2.google.co.uk:443 Domain: google.co.uk URL: http://subdomain.google.com/blah/blah/blah Domain: google.com URL: http://www.google.com?rg=value#anchor Domain: google.com Thanks for your help Pyscho! This is working! Can you tell me what code to add to make it handle both co.uk and com.au domains? (These seem the most popular of extensions, and probably the only two I've actually visited domains on.) For those wondering about these type of domain extensions, I came across a large list of them: http://www.quackit.com/domain-names/country_domain_extensions.cfm Quote Link to comment Share on other sites More sharing options...
Psycho Posted March 22, 2013 Share Posted March 22, 2013 (edited) Can you tell me what code to add to make it handle both co.uk and com.au domains? (These seem the most popular of extensions, and probably the only two I've actually visited domains on.) I could, but I won't. I love helping people, but at some point you need to teach a man to fish rather than giving him a fish. You should be able to see where I implemented logic to handle co.uk. There's only one line to worry about which uses two functions and the ternary operator. You should be able to break down what that line is doing and figure out how to change it for multiple scenarios. Of course, you will need to break it out to multiple lines in a normal if/else condition rather than using the ternary operator. So, give it a try and post back if you run into problems showing the code you have. Edited March 22, 2013 by Psycho Quote Link to comment Share on other sites More sharing options...
DavidAM Posted March 22, 2013 Share Posted March 22, 2013 @psycho While that covers all of the examples, I would tend to code the scheme check a little more generically. As it is, it will not cover mailto:, ftp: or others. //If does not begin with a scheme, add it if(strpos($url, '://') === false) { $url = 'http://' . $url; } Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.