Emirodgar Posted March 24, 2009 Share Posted March 24, 2009 I've post this problem in others forum and untill now nobody has been able to help me, I hope here I can find a solution to my problem. I've made a script that recieves text in html format and replaces some words with links, I use regular expressions to detect links, h1, h2 and other things in the text I recieved just not to be replaced, so the script will just replace plain text. I works great but sometimes if the text has a link, and inside the link the word I want to replace It replaces it and break the link. I've made a small script to see how it works and the mistake, it's ready to be used. I think the problem can be in preg_match_all that it's not able to detect the regular expression and let modify a link. <?php /* I want to replace the word "wordpress" in $content, I use three $content so you can see the diferences, when works good and when fails, just comment and uncomment. If you can see a link GOOD then it's wordking, if not, the function has fail. */ $findRE = '/wordpress/i'; $find = 'wordpress'; $isFind = false; $content='This is going to fail. <a href="http://blog.huebel-online.de/2009/01/11/blogintroduction-wordpress-widget-020-released/comment-page-1/#comment-25315">GOOD</a> Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industrys standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.'; //$content='This is going to work good because the word is before. Wordpress. <a href="http://blog.huebel-online.de/2009/01/11/blogintroduction-wordpress-widget-020-released/comment-page-1/#comment-25315">GOOD</a> Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industrys standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.'; /*$content='This is going to work good. If I put \n after and before the link it works! <a href="http://blog.huebel-online.de/2009/01/11/blogintroduction-wordpress-widget-020-released/comment-page-1/#comment-25315">GOOD</a> Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industrys standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.'; */ $matches = array(); preg_match_all($findRE, $content, $matches, PREG_OFFSET_CAPTURE); $matchData = $matches[0]; $noChanges = array( '/<h[1-6][^>]*>[^<]*'.$find.'[^<]*<\/h[1-6]>/i', '/src=("|\')[^"\']*'.$find.'[^"\']*("|\')/i', '/alt=("|\')[^"\']*'.$find.'[^"\']*("|\')/i', '/title=("|\')[^"\']*'.$find.'[^"\']*("|\')/i', '/content=("|\')[^"\']*'.$find.'[^"\']*("|\')/i', '/<script[^>]*>[^<]*'.$find.'[^<]*<\/script>/i', '/<embed[^>]+>[^<]*'.$find.'[^<]*<\/embed>/i', '/wmode=("|\')[^"\']*'.$find.'[^"\']*("|\')/i', '/<a[^>]+>[^<]*'.$find.'[^<]*<\/a>/i', '/href=("|\')[^"\']+'.$find.'(.*)[^"\']+("|\')/i' ); foreach($noChanges as $noChange){ $results = array(); preg_match_all($noChange, $content, $results, PREG_OFFSET_CAPTURE); $matches = $results[0]; } if(!count($matches) == 0) { foreach($matches as $match){ $start = $match[1]; $end = $match[1] + strlen($match[0]); foreach($matchData as $index => $data){ if($data[1] >= $start && $data[1] <= $end){ $matchData[$index][2] = true; } } } } foreach($matchData as $index => $match){ if($match[2] != true) { $isFind = $match; break; } } if(is_array($isFind)){ $replacement = '<a href="http://wordpress.com"'; $replacement = $replacement.'title="wordpress" >'.$isFind[0].'</a>'; $content = substr($content, 0, $isFind[1]) . $replacement. substr($content, $isFind[1] + strlen($isFind[0]));; } echo $content; ?> Any ideas? Could anyone help me? Thank you very much! Link to comment https://forums.phpfreaks.com/topic/150872-problem-with-preg_match_all/ Share on other sites More sharing options...
Dtonlinegames Posted March 24, 2009 Share Posted March 24, 2009 Are you getting any errors and whats returned when you run the script? Link to comment https://forums.phpfreaks.com/topic/150872-problem-with-preg_match_all/#findComment-792575 Share on other sites More sharing options...
Emirodgar Posted March 24, 2009 Author Share Posted March 24, 2009 No errors, it just replaced the word it should not Link to comment https://forums.phpfreaks.com/topic/150872-problem-with-preg_match_all/#findComment-792641 Share on other sites More sharing options...
thebadbad Posted March 24, 2009 Share Posted March 24, 2009 I've not read all your code, but if I understand you right, you want a regular expression pattern that only matches e.g. wordpress outside of HTML links? If that's it, I found a great post in another forum: http://www.phpbuilder.com/board/showpost.php?p=10267832&postcount=11. And my example: <?php $str = 'Wordpress <a href="http://wordpress.org/">wordpress</a> wordpress. Another link: <a href="http://wordpress.org/">wordpress</a> and again, wordpress.'; echo preg_replace('~wordpress(?=((?!</a>).)*(<a|$))~is', 'REPLACED', $str); ?> Output: REPLACED <a href="http://wordpress.org/">wordpress</a> REPLACED. Another link: <a href="http://wordpress.org/">wordpress</a> and again, REPLACED. Link to comment https://forums.phpfreaks.com/topic/150872-problem-with-preg_match_all/#findComment-792718 Share on other sites More sharing options...
Emirodgar Posted March 24, 2009 Author Share Posted March 24, 2009 Thank you very much for your interest Dtonlinegames and thebadbad! thebadbad, that's not exactly what I want. My code works good, but sometimes it fails, and that's what I don't understand. I use the regular expression to identify links and if my program finds a word inside a link it doesn't replace it, but sometimes it doesn't work and replace a word inside link, so the link gets broken. I need to know why the regular expression works sometimes and other fails, because I'm not able to find the solution Link to comment https://forums.phpfreaks.com/topic/150872-problem-with-preg_match_all/#findComment-793106 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.