Jump to content

Complex replacing of strings in tags


gerkintrigg

Recommended Posts

I think this might be a complex issue, so bear with me...

 

I am writing a spell-checker.

 

I have managed to work out how to take a user input and use a database to replace words within the document. The SQL orders the words and phrases of the database so that it processes the longer strings (the phrases) first. They are then having a Preg_replace function applied to them.

 

To make it a little more complex, I am using preg replace, together with a loop to get uppercase words, lowercase words and words that have a capital first letter, but all the rest of the letters lower case.

 

Each word is being replaced with a span tag, which is constructed like this:

'<span id="'.$word" class="'.$style.'" 
onclick="javascript: 
MyAjaxRequest(\'ajax\',\'../hello.php?state='.$word_all.'_nst_id_007\'); 
popup(\'popUpDiv\',\'\',\'\')">'.$word.'</span>',

 

All well so-far.

 

My problem is that when a part of a string is replaced, I want all other strings within the span tag to be ignored.

 

So for example, if the string is:

I am a string and I would like to replace this body of text here with something else

and the first replacement in the database is:

this body of text here
with
<span id="this_body_of_text_here" 
class="highlight" 
onclick="javascript: MyAjaxRequest(\'ajax\',\'../hello.php?state='.$word_all.'_nst_id_007\'); 
popup(\'popUpDiv\',\'\',\'\')">this body of text here</span>

then the second string might be:

body of text

Currently, the system will replace the second string so that the code contains a <span> tag within the first <span> tag. I just want it to ignore it.

 

my code so-far is:

#original (case insensitive) word from database:
$uc_word=ucwords($word);
$upper_word=strtoupper($word);
$pattern = array(
'~\b'.$word.'\b(?![^<]*?>)~',
'~\b'.$uc_word.'\b(?![^<]*?>)~',
'~\b'.$upper_word.'\b(?![^<]*?>)~'
);

#Replacement highlighted phrase:	
$new_word = array(
'<span id="'.$word_all.'_nst_id_007" class="'.$style.'" onclick="javascript: MyAjaxRequest(\'ajax\',\'../hello.php?state='.$word_all.'_nst_id_007\'); popup(\'popUpDiv\',\'\',\'\')">'.$word.'</span>',
'<span id="'.$word_all.'_nst_id_007" class="'.$style.'" onclick="javascript: MyAjaxRequest(\'ajax\',\'../hello.php?state='.$word_all.'_nst_id_007\'); popup(\'popUpDiv\',\'\',\'\')">'.$uc_word.'</span>',
'<span id="'.$word_all.'_nst_id_007" class="'.$style.'" onclick="javascript: MyAjaxRequest(\'ajax\',\'../hello.php?state='.$word_all.'_nst_id_007\'); popup(\'popUpDiv\',\'\',\'\')">'.$upper_word.'</span>'
);
#now highlight the words:
$page = preg_replace($pattern, $new_word, $page);

 

This does not allow for the span tag checking that I need, and ensuring that it is not inside an existing highlighted area.

 

I'm sure I need something like this (though I know the code is wrong):

'~\b(?!'<span class="highlight"'?)'.$word.'\b(?.*)(?!'</span>')(?![^<]*?>)~',

 

I just have no idea how to make the look in-front check whether the preceding text contains an open, but not a closed <span> tag.

 

If I could establish whether the text contains an open <span> tag, then check if the tag was closed, then in theory, I could make sure that any replacements were only executed if the smaller text is not within the larger (already highlighted) text area.

 

Oh my goodness, I hope it makes sense this time...

 

Please help!

 

Link to comment
Share on other sites

if I understand correctly, you are looping through a string and replacing matches, but if it finds a match, you don't want it to keep preg_replacing stuff, yes? Use a loop, preg_replacing one at a time (use the 4th argument in the preg_replace to limit to 1 replace) and if something is replace, break out of the loop.

Link to comment
Share on other sites

No, the looping is fine as such... all that side of things is working.

 

Let's say that I have this string:

$main_string="this string contains this body of text and works for an example";

the following words & phrases are in the database:

$db_string_1="this body of text";
$db_string_2="this body";
$db_string_3="body of text";
$db_string_4="body";

 

currently, the system replaces the largest string first (which is what I want) so that's $db_string_1 - with a <span> encapsulating the matched word or phrase</span>.

 

Then it checks the others to make sure there is no < or > before or after the word... That all works fine, so because the first match was replaced, it ignores $db_string2 and $db_string_3, but because the word "body" is within the tag, but not encapsulated directly next to either a > or a < symbol, it puts the word "body" ($db_string_4) in another tag. Therefore, when I strip out the highlighting tags for the output, it leaves one set of tags in. And I don't want that.

 

In the example below, I have taken out the javascript just so it makes my example easier to read.

 

so it would currently do this:

output:
this string contains 
    <span class="hightlight">
          this 
             <span class="highlight">
                  body
             </span>
          of text
    </span> 
and works for an example

 

and I want this:

output:
this string contains 
    <span class="hightlight">
       this body of text
   </span> 
and works for an example

 

Link to comment
Share on other sites

Wait what? Maybe it's just me but your second post seems to describe a completely different problem to your first. Make a post that says...

 

1.) The input string will contain x.

2.) The database contains y.

3.) The output should be z.

 

Forget what your currently doing, it's somewhat irrelevant with regards to coming up with a working system.

Link to comment
Share on other sites

Cags, the last post contains all of the requested data...

input:

$main_string="this string contains this body of text and works for an example";

database:

$db_string_1="this body of text";
$db_string_2="this body";
$db_string_3="body of text";
$db_string_4="body";

what I want the output to be:

this string contains 
    <span class="hightlight">
       this body of text
   </span> 
and works for an example

Link to comment
Share on other sites

When you're just replacing simple text, you can achieve this effect with strtr(). But I'm not sure how to get it done with regular expressions involved.

 

$str = 'this string contains this body of text and works for an example';
$replace = array(
'this body of text' => '<span class="highlight">this body of text</span>',
'this body' => '<span class="highlight">this body</span>',
'body of text' => '<span class="highlight">body of text</span>',
'body' => '<span class="highlight">body</span>'
);
echo strtr($str, $replace);
//this string contains <span class="highlight">this body of text</span> and works for an example

Link to comment
Share on other sites

Could I combine my code so that it counts whether the tag is open, so if I do something like replacing it with:

<span class=""highlight>[--open_tag--]text[--close_tag--]</span>

then each time I replace, I could count the number of

[--open_tag--]

and

[--close_tag--]

and if one equals the other, then replace, otherwise, don't.

 

At the final output i could str_replace the

[--open_tag--]

and

[--close_tag--]

with nothing to remove them.

 

That seems to work in my head... but not sure of syntax.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.