Jump to content

[SOLVED] using REGEX to avoid preg_replace of links


SchweppesAle

Recommended Posts

hi, the following code replaces all keywords with a linked keyword.  However, I would like to use REGEX in order to avoid the manipulation of keywords when they fall within links.

 

ex: if the keyword is orange

 

<br>orange</br> would become

 

<br><a href = "somelink">orange</a></br>

 

however I'd like to avoid the following

 

original string:

<a href = "http://somedomain.com/orange">yadayada</a>

 

string after preg_replace:

<a href = "http://somedomain.com/<a href = "somelink">orange</a>">yadayada</a>

 

code:

global $mainframe;

	$content = $article -> text;		
	$current = JURI::current();

	$keywords = $this -> params->def('keywords');
	$words = explode(",", $keywords);

	$keywordLinks = $this -> params->def('links');
	$Links = explode(",", $keywordLinks);






$number = count($words);

for($i = 0; $i < $number; $i++)
{
$Links[$i] = '<a href = "../plugins/content/keyword.php?destination='.urlencode($Links[$i]).'&location='.urlencode($current).'&keyword='.$words[$i].'">'.$words[$i].'</a>';



}

for($i = 0; $i < $number; $i++)
{
$words[$i] = '/\b'.$words[$i].'\b/';
/*$words[$i] = str_replace(' ', '', $words[$i]);*/
}

$content  = preg_replace($words, $Links, $content);


$article -> text = $content;

actually, I think it's the slash in the urls which is throwing the current pattern off.

 

 

reviews/payroll-relief-from-accountantsworld.html

 

payroll would be replaced in the above string, maybe it's due to the slash preceding the keyword(payroll)??

Nothing to do with the slash. But I got the solution for you. I've utilized a negative lookahead:

 

   for($i = 0; $i < $number; $i++)
   {
   $words[$i] = '/' . preg_quote($words[$i], '/') . '(?![^<]*?>)/';
   /*$words[$i] = str_replace(' ', '', $words[$i]);*/
   }

It searches for the keyword, and when found, checks the following characters. If any other character than < is found 0 or more times, immediately followed by a >, the whole thing fails to match (= the keyword is inside a HTML tag, i.e. between < and >). But if a < is found before a > (to say it loosely), the pattern does match, resulting in a replaced keyword.

 

Basically, keywords found between a set of <> aren't replaced. Hope that's what you're looking for :)

 

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.