Jump to content

replacing keywords but NOT when within an <A> element


markmac

Recommended Posts

Hi there, I'm new to PHP and Regular Expressions, but I've got some code that is causing me major headaches :)

 

I have some HTML code, and what I'm trying to do is wrap any words that are thought to be 'keywords' in <strong> tags.

 

Now this is working fine at the moment, but the problem is because this is HTML code I'm searching through, the regular expression is matching words found within my <A> tags which are obviously breaking the URL's within the href attribute.

 

EXAMPLE...

 

THIS...

<a href="http://java.sun.com/docs/books/tutorial/java/concepts/index.html" rel="external">java website</a>

 

IS REPLACED WITH...

<a href="http://<strong class='keyword'>java</strong>.sun.com/docs/books/tutorial/<strong class='keyword'>java</strong>/concepts/index.html" rel="external"><strong class='keyword'>java</strong> website</a>

 

(NOTE: the <strong> tag is being applied to the word java within the <a> tag itself which is fine, but the problem are all other references to the word 'java' within the href attribute is causing the link to break!

 

How can I avoid this happening? My code is below, any help would be greatly appreciated!!

 

Thank you :)

 

NOTES:

"$this->_articleKeywords" is a private variable that contains a string of keywords (e.g. "these, are, my, keywords").

 

"$this->_articleContent" is a private variable that contains HTML code (e.g. <p>this is my text</p>).

 

private function replaceKeywords()
        {
            $keywords = explode(', ', $this->_articleKeywords);
            
            // this array will be used to store the same values as the $keywords array except the values are converted to regular expression patterns
            $patterns = array();

            // make sure the keywords array is formatted like a regular expression pattern (e.g. wrapped in //)
            foreach ($keywords as $key => $value) { 
                $patterns[$key] = "/\b$value\b/i"; // the pattern is case insensitive, and should only match whole words
            }

            // this array will be used for replacing all found keyword instances
            $replacements = array();

            // create a new array that will be the same as $keywords except the values are wrapped in <strong> tags
            foreach ($keywords as $value) {
                array_push($replacements, "<strong class='keyword'>$value</strong>");
            }

            return preg_replace($patterns, $replacements, $this->_articleContent);
        }

 

Link to comment
Share on other sites

I would imagine you would not want any keywords from any tag (attribute) to be set to strong.

Perhaps this small sample snippet is something along the lines of what you're looking for?

 

Example:

$text = 'The words <abbr = "Sun Microsystems ">sun</abbr> and sun should be bold!
<a href="http://java.sun.com">But only if java and sun is not within a tag attribute!</a>';

$pattern = '#(^|>)[^<]+#';
function replacement($textChunk){
$keywords = array('#\bjava\b#i','#\bsun\b#i');
return preg_replace($keywords, '<strong>\0</strong>', $textChunk[0]);
}
$text = preg_replace_callback($pattern, 'replacement', $text);
echo $text;

Link to comment
Share on other sites

Hi nrg, thanks for posting back!

 

Your example works fine but I'm struggling to get your code to work within mine (looks like an issue of variable scope, but I could be wrong about that). Can you take a look at my updated code below and let me know how I can fix it so it works (basically using a hard coded array like you have done works fine, but the moment I try and reference a private member array then it breaks).

 

The private member '$this->_articleKeywords' is populated via a database so the array values are unknown until the script is called so I can't use a hard coded array like you have.

 

private function replaceKeywords()
        {
            // Make an Array from the keywords string (e.g. "this, is, my, list, of, keywords")
            $keywords = explode(', ', $this->_articleKeywords);

            // make sure the keywords array is formatted like a regular expression pattern (e.g. wrapped in forward slashes)
            foreach ($keywords as $key => $value) {
                $this->_updatedKeywords[$key] = "/\b$value\b/i"; // the pattern is case insensitive, and should only match whole words
            }

            // Store the article content
            $text = $this->_articleContent;

            // Create a regular expression pattern that will ignore element attributes
            $pattern = '#(^|>)[^<]+#';

            function replacement($textChunk)
            {
                return preg_replace($this->_updatedKeywords, '<i style="color:green;"><b>\0</b></i>', $textChunk[0]);
            }

            $text = preg_replace_callback($pattern, 'replacement', $text);

            return $text;
        }

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.