help me fix this script please

May 10, 2010

hello,

i am using this script to remove links in a text:

function xcleaner($url)
{

$U = explode(' ', $url);

$W =array();

foreach ($U as $k => $u)

{

$W = explode('.', $u);

if (stristr($u,'http') || (count($W) > 1 && $W[1] != "") || (count($W) > 2))

{

unset($U[$k]);

return implode(' ',$U);

}

}

return implode(' ',$U);

}

the problem is that it will also remove the first word after the link

example:

http://www.link.com hello my name is bob

would result in:

my name is bob

how can i fix this ?

also, i would like to replace the links with the word "(link)" instead of just removing everything

thanks a lot!

ChemicalBliss · May 10, 2010

Have a look at preg_replace(),

http://uk3.php.net/manual/en/function.preg-replace.php

An expression like this should suffice (from :

// Regex taken from: http://flanders.co.nz/2009/11/08/a-good-url-regular-expression-repost/
$regex = "/(?#Protocol)(??:ht|f)tp(?:s?)\:\/\/|~\/|\/)?(?#Username:Password)(?:\w+:\w+@)?(?#Subdomains)(??:[-\w]+\.)+(?#TopLevel Domains)(?:com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|travel|[a-z]{2}))(?#Port)(?::[\d]{1,5})?(?#Directories)(??:(?:\/(?:[-\w~!$+|.,=]|%[a-f\d]{2})+)+|\/)+|\?|\#)?(?#Query)(??:\?(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=?(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)(?:&(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=?(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)*)*(?#Anchor)(?:\#(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)?/";
$newdata = preg_replace($regex,"(link)",$data);

// Where $data is your content you want to replace links for

-cb-

siric · May 10, 2010

Hi,

I ran the script and it worked perfectly.

$url = "http://www.link.com This is a test and hello my name is bob";
$result = xcleaner($url);

print $result;


function xcleaner($url) {

   $U = explode(' ', $url);

   $W =array();
   foreach ($U as $k => $u) 
      {
      $W = explode('.', $u);
      if (stristr($u,'http') || (count($W) > 1 && $W[1] != "") || (count($W) > 2))
         {
         unset($U[$k]);
         return implode(' ',$U);
         }
      }
   return implode(' ',$U);
   }

[attachment deleted by admin]

ScotDiddle · May 10, 2010

ungovernable,

If I understand your request correctly, the following code produces what you want:

link hello my name is bob

http://link.com hello my name is bob

Scot L. Diddle, Richmond VA


<?php

Header("Cache-control: private, no-cache");
Header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");
Header("Pragma: no-cache");

function xcleaner($url)  {

   $U    = explode(' ', $url);
   $W 	 = array();
   $link = array();

   $anchorLink =  array_shift($U);    // $anchorLink => http://www.link.com
  									  // $U[0] 		 => hello
   								      // $U[1] 		 => my
   								      // $U[2] 		 => name
   								      // $U[3] 		 => is
   								      // $U[4] 		 => bob

   $hasHTTP = stristr($anchorLink,'http');

   if ($hasHTTP) {

	   $W = explode('.', $anchorLink);

	   $numOfWs = count($W);

	   $W1 = $W[1];

	   if ( ($numOfWs > 1 && $W1 != "com") || ($numOfWs > 2) ) {

	       $link[] = $W1;

	       $merge = array_merge($link, $U);

	       $return = implode(' ', $merge);

	       return $return;

	     }
	     else {

	     	$link[] = $anchorLink;

	        $merge = array_merge($link, $U);

	        $return = implode(' ', $merge);

	  		return $return;

	  	}

   	}

}


   $url1 = 'http://www.link.com hello my name is bob';
   $url2 = 'http://link.com hello my name is bob';

   echo xcleaner($url1) . "<br /><br/> \n";
   echo xcleaner($url2) . "<br /><br/> \n";

?>

May 10, 2010

Hi,

I ran the script and it worked perfectly.

$url = "http://www.link.com This is a test and hello my name is bob";
$result = xcleaner($url);

print $result;


function xcleaner($url) {

   $U = explode(' ', $url);

   $W =array();
   foreach ($U as $k => $u) 
      {
      $W = explode('.', $u);
      if (stristr($u,'http') || (count($W) > 1 && $W[1] != "") || (count($W) > 2))
         {
         unset($U[$k]);
         return implode(' ',$U);
         }
      }
   return implode(' ',$U);
   }

yes you are right... i just realized the given example will work

but if i try with this text it will not work:

http://www.dailymotion.com/video/x4o...me-french_news

http://www.dailymotion.com/video/x4o...french-p2_news

http://www.dailymotion.com/video/x4o...french-p3_news

http://www.dailymotion.com/video/x4o...french-p4_news

http://www.dailymotion.com/video/x4s...french-p5_news

Super Size Me est un film documentaire américain réalisé par Morgan Spurlock. Le journaliste décide de se nourrir exclusivement chez McDonald’s pendant un mois et enquête à travers les États-Unis sur les effets néfastes du fast-food et de la célèbre chaîne spécialiste du hamburger, qui entraînent l'accroissement de l'obésité.

i don't understand where the problem comes from..

ungovernable,

If I understand your request correctly, the following code produces what you want:

link hello my name is bob

http://link.com hello my name is bob

actually, i want to replace ALL links with the text "link"

so something like

http://www.awebsite.com/hello/blabla/hi.php would be replaced by "link"

ChemicalBliss · May 11, 2010

Have a look at preg_replace(),

http://uk3.php.net/manual/en/function.preg-replace.php

An expression like this should suffice (from :

// Regex taken from: http://flanders.co.nz/2009/11/08/a-good-url-regular-expression-repost/
$regex = "/(?#Protocol)(??:ht|f)tp(?:s?)\:\/\/|~\/|\/)?(?#Username:Password)(?:\w+:\w+@)?(?#Subdomains)(??:[-\w]+\.)+(?#TopLevel Domains)(?:com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|travel|[a-z]{2}))(?#Port)(?::[\d]{1,5})?(?#Directories)(??:(?:\/(?:[-\w~!$+|.,=]|%[a-f\d]{2})+)+|\/)+|\?|\#)?(?#Query)(??:\?(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=?(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)(?:&(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=?(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)*)*(?#Anchor)(?:\#(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)?/";
$newdata = preg_replace($regex,"(link)",$data);

// Where $data is your content you want to replace links for

-cb-

Have you even tried this script i put together for you?

It does exactly what you want.

-cb-

May 11, 2010

thanks a lot !!! it's working !!

May 17, 2010

Have a look at preg_replace(),

http://uk3.php.net/manual/en/function.preg-replace.php

An expression like this should suffice (from :

// Regex taken from: http://flanders.co.nz/2009/11/08/a-good-url-regular-expression-repost/
$regex = "/(?#Protocol)(??:ht|f)tp(?:s?)\:\/\/|~\/|\/)?(?#Username:Password)(?:\w+:\w+@)?(?#Subdomains)(??:[-\w]+\.)+(?#TopLevel Domains)(?:com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|travel|[a-z]{2}))(?#Port)(?::[\d]{1,5})?(?#Directories)(??:(?:\/(?:[-\w~!$+|.,=]|%[a-f\d]{2})+)+|\/)+|\?|\#)?(?#Query)(??:\?(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=?(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)(?:&(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=?(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)*)*(?#Anchor)(?:\#(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)?/";
$newdata = preg_replace($regex,"(link)",$data);

// Where $data is your content you want to replace links for

-cb-

Have you even tried this script i put together for you?

It does exactly what you want.

-cb-

i have a problem with this script

for example, this text:

[DL]http://www.megaupload.com/?d=QI29F7AJ[/DL]

maxi repressage de 1982

01 - couleurs sur paris.mp3

02 - maximum.mp3

03 - tout ce fric.mp3

04 - poupee de cire.mp3

05 - piano dub.mp3

AlbumArtSmall.jpg

Folder.jpg

OBERKAMPF-LP-Couleurs5tvert.jpg

will turn into:

[DL](link)[/DL]

maxi repressage de 1982

(link)3

(link)3

(link)3

(link)3

(link)3

AlbumArtSmall.jpg

Folder.jpg

OBERKAMPF-LP-Couleurs5tvert.jpg

i want to convert only the links that start with http://

but the script thinks the list of the mp3 names are links

any help would be appreciated!

May 17, 2010

bump

May 21, 2010

bump!

here's another example of a text that will be messed up once parsed with the function given in ChemicalBliss's post

Streaming:
1 - http://www.ubest1.com/index.php?video_user=13752|hdad|hdad_1272918833_vi deo.flv

2 - http://www.ubest1.com/user/hdad/video/13760

3 - http://www.ubest1.com/user/hdad/video/13767

4 - http://www.ubest1.com/user/hdad/video/13780

ChemicalBliss · May 21, 2010

Sorry bout the long reply but it's a simple fix.

If you want it to only pick out URLs with http:// etc (protocols) then change the ? (0 or more) to + (1 or more) at the end of the protocol sub-pattern. e.g:

/(?#Protocol)(??:ht|f)tp(?:s?)\:\/\/|~\/|\/)+(?#Username:Password)(?:\w+:\w+@)?(?#Subdomains)(??:[-\w]+\.)+(?#TopLevel Domains)(?:com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|travel|[a-z]{2}))(?#Port)(?::[\d]{1,5})?(?#Directories)(??:(?:\/(?:[-\w~!$+|.,=]|%[a-f\d]{2})+)+|\/)+|\?|\#)?(?#Query)(??:\?(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=?(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)(?:&(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=?(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)*)*(?#Anchor)(?:\#(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)?/

if you want to pick out specific URLs that do not have a protocol in the link, you can remove the [a-z]{2} (Match 2 Alphabetical Characters), with the | (or) bracket, then you will have to add all the current two letter top-level domains listed (They can and most likely will change), This is why this regex matches "paris.mp" and "maximum.mp" etc, because it looks like a domain (and its true it does - https://www.mp/).

A Better Alternative?:

This one should match any 2 character domain, but only if there isnt a 3rd character or digit. (So would match .mp but not .mp3).

/(?#Protocol)(??:ht|f)tp(?:s?)\:\/\/|~\/|\/)?(?#Username:Password)(?:\w+:\w+@)?(?#Subdomains)(??:[-\w]+\.)+(?#TopLevel Domains)(?:com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|travel|[a-z]{2}[^a-z0-9]+))(?#Port)(?::[\d]{1,5})?(?#Directories)(??:(?:\/(?:[-\w~!$+|.,=]|%[a-f\d]{2})+)+|\/)+|\?|\#)?(?#Query)(??:\?(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=?(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)(?:&(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=?(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)*)*(?#Anchor)(?:\#(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)?/

Last one may not be perfect, not tested it fully. Maybe the guys over at the REGEX forum on phpfreaks can help you further if you need it.

-cb-

Sign In

help me fix this script please

Recommended Posts

Guest

Link to comment

Share on other sites

ChemicalBliss

Link to comment

Share on other sites

siric

Link to comment

Share on other sites

ScotDiddle

Link to comment

Share on other sites

Guest

Link to comment

Share on other sites

ChemicalBliss

Link to comment

Share on other sites

Guest

Link to comment

Share on other sites

Guest

Link to comment

Share on other sites

Guest

Link to comment

Share on other sites

Guest

Link to comment

Share on other sites

ChemicalBliss

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information