Jump to content

Archived

This topic is now archived and is closed to further replies.

michaellunsford

highlighting search terms

Recommended Posts

well, I started this in the regular PHP section, but it no longer fits there. Suffice it to say, I'm trying to take individual search terms that are being $_POSTed and highlighting them in the search results.

The [url=http://www.phpfreaks.com/forums/index.php/topic,122532.0.html]Original Post[/url] talked about using str_replace to handle this. New problem, though, when the same search terms show up inside a HTML tag (like <img src="search_term">).

I'm trying "/\b(?!<.+?>)search_term\b/" -- but it's still finding "search_term" inside <img src="search_term">.

Thanks!

Share this post


Link to post
Share on other sites
Either [url=http://www.phpfreaks.com/forums/index.php/topic,99040.msg389868.html#msg389868]separate the tags from the content and process[/url], or just analyze the non-tagged content:

[code]
<pre>
<?php
$tests = array(
'<img src="search_term">',
'<a>search_term</a>',
'<a>Xsearch_termX</a>',
);

$term = 'search_term';
echo "Searching for <b>$term</b>...<br>";
foreach ($tests as $test) {
echo htmlspecialchars($test), ' => ';
$test = preg_replace_callback(
'/>(.+?)</',
create_function(
'$matches',
'return preg_replace("/\b(' . preg_quote($term) . ')\b/", "<b>\\\1</b>", $matches[0]);'
),
$test
);
echo htmlspecialchars($test), '<br>';
}
?>
</pre>

[/code]

Share this post


Link to post
Share on other sites
Wow, that's intense. All kinds of functions I've never heard of. In fact after reading the manual page on create_function(), I still don't understand it. $matches doesn't exist anywhere outside of create_function()???

Anyway, it's doing the same thing my original preg_replace() was doing (finding matches inside tags). Here's the new function, minus the array stuff:
[code]<?php
$test="<table>\r\n<tr>\r\n\t<td><img src=\"somestring.jpg\" alt=\"\"></td>\r\n</tr>\r\n<tr>\r\n\t<td>somestring</td>\r\n</tr>\r\n</table>\r\n";
$term = 'somestring';
$test = preg_replace_callback(
'/>(.+?)</',
create_function(
'$matches',
'return preg_replace("/\b(' . preg_quote($term) . ')\b/", "<span style=\"background:#FF0;\">\\\1</span>", $matches[0]);'
),
$test
);
echo $test;
?>[/code]

and the output

[code]<table>
<tr>
        <td><img src="<span style="background:#FF0;">somestring</span>.jpg" alt=""></td>
</tr>
<tr>
        <td><span style="background:#FF0;">somestring</span></td>
</tr>
</table>[/code]

Share this post


Link to post
Share on other sites
hey, check this out...
[code]<?php
$test="<table>\r\n<tr>\r\n\t<td><img src=\"somestring.jpg\" alt=\"\"></td>\r\n</tr>\r\n<tr>\r\n\t<td>somestring</td>\r\n</tr>\r\n</table>\r\n";
preg_match("/>(.+?)</",$test,$new_array);
print_r($new_array);
?>[/code]

result:
[code]Array
(
    [0] => ><img src="somestring.jpg" alt=""><
    [1] => <img src="somestring.jpg" alt="">
)
[/code]

Share this post


Link to post
Share on other sites
Ah, of course. Change[tt] />(.+?)</ [/tt]to[tt] />(.*?)</[/tt].

Share this post


Link to post
Share on other sites
First, let me say it works great!

However, I don't understand why />(.*?)</ is returning ">somestring<" instead of just "somestring" ???

Share this post


Link to post
Share on other sites
Where? The entire match includes the angle brackets, but the 1st capture does not. It depends on which part you're using.

Share this post


Link to post
Share on other sites
here's what />(.*?)</ matches in the example:
[code]    [0] => Array
        (
            [0] => ><
            [1] => ><
            [2] => > <
            [3] => >somestring<
            [4] => > <
        )

    [1] => Array
        (
            [0] =>
            [1] =>
            [2] => 
            [3] => somestring
            [4] => 
        )
[/code]

Share this post


Link to post
Share on other sites
A better alternative might be [tt]/(?<=>)([^<]+)/[/tt].

Share this post


Link to post
Share on other sites
Dude, it's magic. One more thing that is not quite working right: MySQL is returning case insensitive results. Is there any way to make [code=php:0]'/\b(' . preg_quote($newstring) . ')\b/'[/code] case insensitive?

Also, if you're willing to educate, I'm struggling to follow this part of the code. (?<=>) is a lookbehind for ">"? It's working perfectly to locate the text I'm searching for -- even text that isn't preceeded by ">". I don't get it, is lookbehind optional?

Share this post


Link to post
Share on other sites
Yes, put an "i" after the closing delimiter: [tt]/pattern/i[/tt]

You are correct about the lookbehind; however, they are not optional (by default). Why do you think it is matching text that isn't preceded by ">"?

Share this post


Link to post
Share on other sites
[tt]([^<]+) [/tt]is capturing the CRs and NLs, and[tt] (?<=>) [/tt]is still anchoring at ">". Observe:

[code]
<pre>
<?php
$test = "<table>\r\n<tr>\r\n\t<td><img src=\"somestring.jpg\" alt=\"\"></td>\r\n</tr>\r\n<tr>\r\n\t<td>somestring</td>\r\n</tr>\r\n</table>\r\n";
preg_match_all('/(?<=>)([^<]+)/', $test, $matches);
$replace = array(
"\n" => '\n',
"\r" => '\r',
);
foreach ($matches as &$array) {
foreach ($array as &$match) {
$match = preg_replace('/([\r\n])/e', '$replace["\1"]', $match);
}
}
print_r($matches);
?>
</pre>

[/code]

Share this post


Link to post
Share on other sites

I do like this...

 

<?php

$arrayofwords = array ();
$arrayofwords[0] = "This";
$arrayofwords[1] = "text";
$arrayofwords[2] = "need";
$arrayofwords[3] = "words";

$str = 'This is my <img src="" title="This image text"> long text <a href="#">words</a> where I need to highlight words in the HTML text.';

$str = preg_replace ( "/(?!(?:[^<]+>|[^>]+<\/a>))\b(" . implode ( '|', $arrayofwords ) . ")\b/is", "<strong>\\1</strong>", $str );

echo $str;

?>

Share this post


Link to post
Share on other sites

Of course, your code only works in sites with strictly Latin words. Otherwise, see this.

Share this post


Link to post
Share on other sites

Also, if you want to exclude search_term from within head/script/a blocks as well as from within tags:

$html=preg_replace_callback('~(<head>.*?</head>|<script\s[^>]*>.*?</script>|<a\s[^>]*>.*?</a>)|search_term(?!(?=[^<>]*>))~is',create_function('$matches','return isset($matches[1]) ? $matches[1] : "<strong>$matches[0]</strong>" ;'),$html);

Share this post


Link to post
Share on other sites

I do like this...

I like this too. But can you deal with "<" and ">" too?

Share this post


Link to post
Share on other sites

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.