[SOLVED] str_replace different url structures

doa24uk · July 31, 2009

Hi guys,

Here's my code. It stips the URL down to site.com rather than having http://www.site.com

$linkurl = "http://site.com";


// Split link to just get domain name

function parse_url_domain ($url) {
$parsed = parse_url($url);
$hostname = $parsed['host'];
return $hostname;
}

$raw_url = parse_url($linkurl);
$domain_only =str_replace ('www.','', $raw_url);
echo $domain_only['host'];
exit();

The problem is I'm using this to strip URLs that have various structures & need to strip them ALL down to sitename.tld

eg.

http://www4.site2.com

http://site3.com

http://www15.site4.com

Is there a way to tell the script to knock off everything in these cases so we're left with

site2.com

site3.com

site4.com

?????????

:facewall:

WolfRage · July 31, 2009

This could get very complex very quickly. But if those are the only types of url then try this.

<?php
$url=array_reverse(explode('.',$url),TRUE);
$url=$url[1].$url[0];
?>

Now if you need it to do more with more difficult url's let me know and I will take this further.

doa24uk · July 31, 2009

I'm not entirely sure how to integrate that with my initial script ... :shrug: :confused:

WolfRage · July 31, 2009

<?php
$url = http://site.com;
// Split link to just get domain name

function parse_url_domain ($url) {
  $parsed = parse_url($url);
  $hostname = $parsed['host'];
  return $hostname;
}

$raw_url = parse_url($url);
$url=array_reverse(explode('.',$raw_url),TRUE);
$domain_only=$url[1].$url[0];
echo $domain_only;
exit();
?>

doa24uk · July 31, 2009

This just outputs - Array

WolfRage · July 31, 2009

The problem was, you did not actually call your function.

<?php
$url = 'http://site.com';
// Split link to just get domain name

function parse_url_domain ($url) {
  $parsed = parse_url($url);
  $hostname = $parsed['host'];
  return $hostname;
}

$raw_url = parse_url_domain($url);
$url=array_reverse(explode('.',$raw_url),TRUE);
$domain_only=$url[1].$url[0];
echo $domain_only;
exit();
?>

doa24uk · July 31, 2009

Sorry to be a pain but this isn't working again.....

Your exact code -

<?php
$url = 'http://site.com';
// Split link to just get domain name

function parse_url_domain ($url) {
  $parsed = parse_url($url);
  $hostname = $parsed['host'];
  return $hostname;
}

$raw_url = parse_url_domain($url);
$url=array_reverse(explode('.',$raw_url),TRUE);
$domain_only=$url[1].$url[0];
echo $domain_only;
exit();
?>

Gives the following results $url values first followed by outputted results

http://site.com --> comsite

http://www.site.com --> sitewww

http://www2.site.com --> siteww2

-------------

So I removed the $url[0] to give the following code, but this is still erroring (although it's closer)

<?php
$url = 'http://www2.site.com';
// Split link to just get domain name

function parse_url_domain ($url) {
  $parsed = parse_url($url);
  $hostname = $parsed['host'];
  return $hostname;
}

$raw_url = parse_url_domain($url);
$url=array_reverse(explode('.',$raw_url),TRUE);
$domain_only=$url[1];
echo $domain_only;
exit();
?>

The above code gives the following results

http://site.com --> com

http://www.site.com --> site

http://www2.site.com --> site

----

So to my mind what we need strip the .com or .net (I will only be using TLDs with one . - so no .co.uk for example) then parse the URL.....

I'm one step away, help please!

WolfRage · July 31, 2009

K make this line

<?php
$url=array_reverse(explode('.',$raw_url),TRUE);
?>

Like this.

<?php
$url=array_reverse(explode('.',$raw_url));
?>

Now we need to take it further, now you need to do checks with in the array for each TLD and remove that from the array, also you need to check for each possibility of www eg www1 www2 etc. I will post back soon when I have taken this further.

roopurt18 · July 31, 2009

<?php
$urls = array();
$urls[] = 'http://www4.site2.com';
$urls[] = 'http://site3.com';
$urls[] = 'http://www15.site4.com';
$urls[] = 'http://www15.site4.com/alsjdflsjf/asdflajsfdlajsf?lsdjflj&lsdjflsf&lsjdflsjf=alsdjflasjf&lsfjlsdf';

foreach( $urls as $url ) {
    echo get_domain( $url ) . '<br />';
}

/**
* Extracts the domain from a URL
* 
* @param string $url
* @return string
*/
function get_domain( $url ) {
    if( strpos( $url, '://' ) !== false ) {
        list( $throwAway, $url ) = explode( '://', $url );
    }
    if( strpos( $url, '/' ) !== false ) {
        $url = explode( '/', $url );
        $url = array_shift( $url );
    }
    return $url;
}
?>

roopurt18 · July 31, 2009

Updated for what you want:

<?php
$urls = array();
$urls[] = 'http://www4.site2.com';
$urls[] = 'http://site3.com';
$urls[] = 'http://www15.site4.com';
$urls[] = 'http://www15.site4.com/alsjdflsjf/asdflajsfdlajsf?lsdjflj&lsdjflsf&lsjdflsjf=alsdjflasjf&lsfjlsdf';

foreach( $urls as $url ) {
    echo get_domain( $url ) . '<br />';
}

/**
* Extracts the domain from a URL
* 
* @param string $url
* @return string
*/
function get_domain( $url ) {
    if( strpos( $url, '://' ) !== false ) {
        list( $throwAway, $url ) = explode( '://', $url );
    }
    if( strpos( $url, '/' ) !== false ) {
        $url = explode( '/', $url );
        $url = array_shift( $url );
    }
    if( preg_match( '/^www[^.]*\..*/', $url ) ) {
        $url = explode( '.', $url );
        array_shift( $url );
        $url = implode( '.', $url );
    }
    return $url;
}
?>

WolfRage · July 31, 2009

Ok so regular expression matching blows my method out of the water. But this is what I had come up with so far although far from perfected.

<?php
$www=array('www','www1','www2');
$tld=array('co','com','uk','net','org','us','biz');
$url = 'http://www2.site.com';
// Split link to just get domain name
function parse_url_domain ($url) {
  $parsed = parse_url($url);
  return $parsed['host'];
}
$url=parse_url_domain($url);
echo $url.'<br />';
$url=array_reverse(explode('.',$url));
var_dump($url);
echo '<br />';
foreach($www as $key=>$value) {
  if(in_array($value,$url)) {
    unset($url[array_search($value,$url)]);
  }
}
foreach($tld as $key=>$value) {
  if(in_array($value,$url)) {
    unset($url[array_search($value,$url)]);
  }
}
var_dump($url);
echo '<br />';
$domain_only=$url[1];
echo $domain_only;
exit();
?>

By the way roopurt18 that is a genius little script. Utilizes some of what I started and then accounts for any case by using perg_match().

roopurt18 · July 31, 2009

I don't know why I didn't just use regexp on the whole thing.

<?php
$urls = array();
$urls[] = 'http://www4.site2.com';
$urls[] = 'http://site3.com';
$urls[] = 'http://www15.site4.com';
$urls[] = 'http://www15.site4.com/alsjdflsjf/asdflajsfdlajsf?lsdjflj&lsdjflsf&lsjdflsjf=alsdjflasjf&lsfjlsdf';

foreach( $urls as $url ) {
    echo print_r( get_domain( $url ) ) . '<br />';
}

/**
* Extracts the domain from a URL
* 
* @param string $url
* @return string | boolean
*/
function get_domain( $url ) {
    if( preg_match( '@^http://(www[^.]+\.)?([^/]+)@', $url, $matches ) ) {
        return $matches[2];
    }
    return false;
}
?>

doa24uk · July 31, 2009

Awesome guys!

@roopurt - that did the trick once I stopped being an idiot & actually looked at the code you'd provided

@Wolfrage - thank you for all your help, pity roopurt pipped you to the prize but we (by that I mean you) would have got there soon enough!

This is part of a larger script I'm designing & I would like to credit both of you, where would you like the links pointing for the credits??

roopurt18 · July 31, 2009

No need to provide credit. But if you feel you must you can just point back at the URL of this topic. Or my user profile.

Let me know if you need my PayPal to pay royalties and such.

Sign In

[SOLVED] str_replace different url structures

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived

Important Information