Jump to content

Problem with preg_match_all


Emirodgar

Recommended Posts

I've post this problem in others forum and untill now nobody has been able to help me, I hope here I can find a solution to my problem.

 

I've made a script that recieves text in html format and replaces some words with links, I use regular expressions to detect links, h1, h2 and other things in the text I recieved just not to be replaced, so the script will just replace plain text.

 

I works great but sometimes if the text has a link, and inside the link the word I want to replace It replaces it and break the link.

 

I've made a small script to see how it works and the mistake, it's ready to be used.

I think the problem can be in preg_match_all that it's not able to detect the regular expression and let modify a link.

 

<?php
/*
I want to replace the word "wordpress" in $content, I use three $content so you can see the diferences, when works good and when fails, just comment and uncomment.
If you can see a link GOOD then it's wordking, if not, the function has fail.
*/

$findRE = '/wordpress/i';

$find = 'wordpress';
$isFind = false;

$content='This is going to fail. <a href="http://blog.huebel-online.de/2009/01/11/blogintroduction-wordpress-widget-020-released/comment-page-1/#comment-25315">GOOD</a> Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industrys standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.';

//$content='This is going to work good because the word is before. Wordpress. <a href="http://blog.huebel-online.de/2009/01/11/blogintroduction-wordpress-widget-020-released/comment-page-1/#comment-25315">GOOD</a> Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industrys standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.';

/*$content='This is going to work good. If I put \n after and before the link it works!
<a href="http://blog.huebel-online.de/2009/01/11/blogintroduction-wordpress-widget-020-released/comment-page-1/#comment-25315">GOOD</a> 
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industrys standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.';
*/

$matches = array();
preg_match_all($findRE, $content, $matches, PREG_OFFSET_CAPTURE);
$matchData = $matches[0];

$noChanges = array(
'/<h[1-6][^>]*>[^<]*'.$find.'[^<]*<\/h[1-6]>/i',
'/src=("|\')[^"\']*'.$find.'[^"\']*("|\')/i',
'/alt=("|\')[^"\']*'.$find.'[^"\']*("|\')/i',
'/title=("|\')[^"\']*'.$find.'[^"\']*("|\')/i',
'/content=("|\')[^"\']*'.$find.'[^"\']*("|\')/i',
'/<script[^>]*>[^<]*'.$find.'[^<]*<\/script>/i',
'/<embed[^>]+>[^<]*'.$find.'[^<]*<\/embed>/i',
'/wmode=("|\')[^"\']*'.$find.'[^"\']*("|\')/i',
'/<a[^>]+>[^<]*'.$find.'[^<]*<\/a>/i',
'/href=("|\')[^"\']+'.$find.'(.*)[^"\']+("|\')/i'
);

foreach($noChanges as $noChange){
$results = array();
preg_match_all($noChange, $content, $results, PREG_OFFSET_CAPTURE);
$matches = $results[0];

}

if(!count($matches) == 0) {
foreach($matches as $match){
	$start = $match[1];
	$end = $match[1] + strlen($match[0]);
	foreach($matchData as $index => $data){
		if($data[1] >= $start && $data[1] <= $end){
			$matchData[$index][2] = true;
		}
	}
}
}		

foreach($matchData as $index => $match){
if($match[2] != true) {
	$isFind = $match;
	break;
}
}

if(is_array($isFind)){
$replacement = '<a href="http://wordpress.com"';
$replacement =	$replacement.'title="wordpress" >'.$isFind[0].'</a>';

$content = substr($content, 0, $isFind[1]) . $replacement. substr($content, $isFind[1] + strlen($isFind[0]));;
}
echo $content;

?>

 

Any ideas? Could anyone help me?

 

Thank you very much!

Link to comment
https://forums.phpfreaks.com/topic/150872-problem-with-preg_match_all/
Share on other sites

I've not read all your code, but if I understand you right, you want a regular expression pattern that only matches e.g. wordpress outside of HTML links? If that's it, I found a great post in another forum: http://www.phpbuilder.com/board/showpost.php?p=10267832&postcount=11. And my example:

 

<?php
$str = 'Wordpress <a href="http://wordpress.org/">wordpress</a> wordpress. Another link: <a href="http://wordpress.org/">wordpress</a> and again, wordpress.';
echo preg_replace('~wordpress(?=((?!</a>).)*(<a|$))~is', 'REPLACED', $str);
?>

 

Output:

REPLACED <a href="http://wordpress.org/">wordpress</a> REPLACED. Another link: <a href="http://wordpress.org/">wordpress</a> and again, REPLACED.

Thank you very much for your interest Dtonlinegames and thebadbad!

 

thebadbad, that's not exactly what I want. My code works good, but sometimes it fails, and that's what I don't understand.

 

I use the regular expression to identify links and if my program finds a word inside a link it doesn't replace it, but sometimes it doesn't work and replace a word inside link, so the link gets broken.

 

I need to know why the regular expression works sometimes and other fails, because I'm not able to find the solution :(

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.