Jump to content

[SOLVED] str_replace different url structures


doa24uk

Recommended Posts

Hi guys,

 

Here's my code. It stips the URL down to site.com rather than having http://www.site.com

 

$linkurl = "http://site.com";


// Split link to just get domain name

function parse_url_domain ($url) {
$parsed = parse_url($url);
$hostname = $parsed['host'];
return $hostname;
}

$raw_url = parse_url($linkurl);
$domain_only =str_replace ('www.','', $raw_url);
echo $domain_only['host'];
exit();

 

The problem is I'm using this to strip URLs that have various structures & need to strip them ALL down to sitename.tld

 

eg.

 

http://www4.site2.com

http://site3.com

http://www15.site4.com

 

Is there a way to tell the script to knock off everything in these cases so we're left with

 

site2.com

site3.com

site4.com

 

?????????

 

:facewall: :facewall:

This could get very complex very quickly. But if those are the only types of url then try this.

<?php
$url=array_reverse(explode('.',$url),TRUE);
$url=$url[1].$url[0];
?>

Now if you need it to do more with more difficult url's let me know and I will take this further.

<?php
$url = http://site.com;
// Split link to just get domain name

function parse_url_domain ($url) {
  $parsed = parse_url($url);
  $hostname = $parsed['host'];
  return $hostname;
}

$raw_url = parse_url($url);
$url=array_reverse(explode('.',$raw_url),TRUE);
$domain_only=$url[1].$url[0];
echo $domain_only;
exit();
?>

The problem was, you did not actually call your function.

<?php
$url = 'http://site.com';
// Split link to just get domain name

function parse_url_domain ($url) {
  $parsed = parse_url($url);
  $hostname = $parsed['host'];
  return $hostname;
}

$raw_url = parse_url_domain($url);
$url=array_reverse(explode('.',$raw_url),TRUE);
$domain_only=$url[1].$url[0];
echo $domain_only;
exit();
?>

Sorry to be a pain but this isn't working again.....

 

Your exact code -

 

<?php
$url = 'http://site.com';
// Split link to just get domain name

function parse_url_domain ($url) {
  $parsed = parse_url($url);
  $hostname = $parsed['host'];
  return $hostname;
}

$raw_url = parse_url_domain($url);
$url=array_reverse(explode('.',$raw_url),TRUE);
$domain_only=$url[1].$url[0];
echo $domain_only;
exit();
?>

 

Gives the following results $url values first followed by outputted results

 

http://site.com -->  comsite

http://www.site.com --> sitewww

http://www2.site.com --> siteww2

 

-------------

 

So I removed the $url[0] to give the following code, but this is still erroring (although it's closer)

 

<?php
$url = 'http://www2.site.com';
// Split link to just get domain name

function parse_url_domain ($url) {
  $parsed = parse_url($url);
  $hostname = $parsed['host'];
  return $hostname;
}

$raw_url = parse_url_domain($url);
$url=array_reverse(explode('.',$raw_url),TRUE);
$domain_only=$url[1];
echo $domain_only;
exit();
?>

 

The above code gives the following results

 

http://site.com -->  com

http://www.site.com --> site

http://www2.site.com --> site

 

----

 

 

So to my mind what we need strip the .com or .net (I will only be using TLDs with one . - so no .co.uk for example) then parse the URL.....

 

I'm one step away, help please!

K make this line

<?php
$url=array_reverse(explode('.',$raw_url),TRUE);
?>

Like this.

<?php
$url=array_reverse(explode('.',$raw_url));
?>

Now we need to take it further, now you need to do checks with in the array for each TLD and remove that from the array, also you need to check for each possibility of www eg www1 www2 etc.  I will post back soon when I have taken this further.

<?php
$urls = array();
$urls[] = 'http://www4.site2.com';
$urls[] = 'http://site3.com';
$urls[] = 'http://www15.site4.com';
$urls[] = 'http://www15.site4.com/alsjdflsjf/asdflajsfdlajsf?lsdjflj&lsdjflsf&lsjdflsjf=alsdjflasjf&lsfjlsdf';

foreach( $urls as $url ) {
    echo get_domain( $url ) . '<br />';
}

/**
* Extracts the domain from a URL
* 
* @param string $url
* @return string
*/
function get_domain( $url ) {
    if( strpos( $url, '://' ) !== false ) {
        list( $throwAway, $url ) = explode( '://', $url );
    }
    if( strpos( $url, '/' ) !== false ) {
        $url = explode( '/', $url );
        $url = array_shift( $url );
    }
    return $url;
}
?>

Updated for what you want:

 

<?php
$urls = array();
$urls[] = 'http://www4.site2.com';
$urls[] = 'http://site3.com';
$urls[] = 'http://www15.site4.com';
$urls[] = 'http://www15.site4.com/alsjdflsjf/asdflajsfdlajsf?lsdjflj&lsdjflsf&lsjdflsjf=alsdjflasjf&lsfjlsdf';

foreach( $urls as $url ) {
    echo get_domain( $url ) . '<br />';
}

/**
* Extracts the domain from a URL
* 
* @param string $url
* @return string
*/
function get_domain( $url ) {
    if( strpos( $url, '://' ) !== false ) {
        list( $throwAway, $url ) = explode( '://', $url );
    }
    if( strpos( $url, '/' ) !== false ) {
        $url = explode( '/', $url );
        $url = array_shift( $url );
    }
    if( preg_match( '/^www[^.]*\..*/', $url ) ) {
        $url = explode( '.', $url );
        array_shift( $url );
        $url = implode( '.', $url );
    }
    return $url;
}
?>

Ok so regular expression matching blows my method out of the water. But this is what I had come up with so far although far from perfected.

<?php
$www=array('www','www1','www2');
$tld=array('co','com','uk','net','org','us','biz');
$url = 'http://www2.site.com';
// Split link to just get domain name
function parse_url_domain ($url) {
  $parsed = parse_url($url);
  return $parsed['host'];
}
$url=parse_url_domain($url);
echo $url.'<br />';
$url=array_reverse(explode('.',$url));
var_dump($url);
echo '<br />';
foreach($www as $key=>$value) {
  if(in_array($value,$url)) {
    unset($url[array_search($value,$url)]);
  }
}
foreach($tld as $key=>$value) {
  if(in_array($value,$url)) {
    unset($url[array_search($value,$url)]);
  }
}
var_dump($url);
echo '<br />';
$domain_only=$url[1];
echo $domain_only;
exit();
?>

By the way roopurt18 that is a genius little script. Utilizes some of what I started and then accounts for any case by using perg_match().

I don't know why I didn't just use regexp on the whole thing.

 

<?php
$urls = array();
$urls[] = 'http://www4.site2.com';
$urls[] = 'http://site3.com';
$urls[] = 'http://www15.site4.com';
$urls[] = 'http://www15.site4.com/alsjdflsjf/asdflajsfdlajsf?lsdjflj&lsdjflsf&lsjdflsjf=alsdjflasjf&lsfjlsdf';

foreach( $urls as $url ) {
    echo print_r( get_domain( $url ) ) . '<br />';
}

/**
* Extracts the domain from a URL
* 
* @param string $url
* @return string | boolean
*/
function get_domain( $url ) {
    if( preg_match( '@^http://(www[^.]+\.)?([^/]+)@', $url, $matches ) ) {
        return $matches[2];
    }
    return false;
}
?>

Awesome guys!

 

@roopurt - that did the trick once I stopped being an idiot & actually looked at the code you'd provided

@Wolfrage - thank you for all your help, pity roopurt pipped you to the prize but we (by that I mean you) would have got there soon enough!

 

This is part of a larger script I'm designing & I would like to credit both of you, where would you like the links pointing for the credits??

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.