Jump to content

Remove text from string


johnsmith153

Recommended Posts

I have a text string (which I can't change).

 

I need to remove from it an entire sentence where a provided value exists in that sentence somewhere.

 

Example:

$sentence = "The quick brown fox jumped over the lazy dog. This is a test. Thank you.";

function remove_sentence($remove_value, $sentence) {
  
  //remove all sentences containing $value from $sentence

  return $new_sentence

}

remove_sentence("quick brown", $sentence); // returns 'This is a test. Thank you.'

 

Can someone help me complete this?

Link to comment
Share on other sites

That's a pretty tall order due to the complexity of what you are asking. You could simply use a regular expression to look for a segment of the string that contains the text where the segment begins the string or follows a period and then ends with a period. But, that is not fool-proof. Because if a sentence contains something like "$12.35" the decimal would be considered an ending period. You could tweak that some more to be where the segment ends in a period + space or the period is the last character in the string, but it still doesn't consider all possible scenarios.

Link to comment
Share on other sites

I'm happy for a semi-fool-proof solution. I think one which ends with a ". " (full stop followed by a space) is good enough for me.

 

I would appreciate if somebody could point me in the right direction as to a regular expression that would do this. I'm guessing preg-replace, but as the to regular expression I am clueless.

 

Thanks for everybody's input so far.

 

Link to comment
Share on other sites

use strstr, if it returns true reset $sentance variable .

 

thanks

 

I have a text string (which I can't change).

 

I need to remove from it an entire sentence where a provided value exists in that sentence somewhere.

 

Example:

$sentence = "The quick brown fox jumped over the lazy dog. This is a test. Thank you.";

function remove_sentence($remove_value, $sentence) {
  
  //remove all sentences containing $value from $sentence

  return $new_sentence

}

remove_sentence("quick brown", $sentence); // returns 'This is a test. Thank you.'

 

Can someone help me complete this?

Link to comment
Share on other sites

use strstr, if it returns true reset $sentance variable .

 

thanks

 

I have a text string (which I can't change).

 

I need to remove from it an entire sentence where a provided value exists in that sentence somewhere.

 

Example:

$sentence = "The quick brown fox jumped over the lazy dog. This is a test. Thank you.";

function remove_sentence($remove_value, $sentence) {
  
  //remove all sentences containing $value from $sentence

  return $new_sentence

}

remove_sentence("quick brown", $sentence); // returns 'This is a test. Thank you.'

 

Can someone help me complete this?

 

Well obviously that's not going to work. Thanks for trying though.

Link to comment
Share on other sites

you could try this... after what you described it will do the job

<?php
$sentence = "The quick brown fox jumped over the lazy dog. This is a test. Thank you.";

function remove_sentence($remove_value, $sentence) {
  
$stc = explode(". ",$sentence);

foreach($stc as $stc) {
if(strpos($stc,$remove_value) === false) {
$new_sentence.=$stc.".";
}
} // end for

  return $new_sentence;

}

echo remove_sentence("quick brown", $sentence); // returns 'This is a test. Thank you.'
?>

Link to comment
Share on other sites

I'm happy for a semi-fool-proof solution. I think one which ends with a ". " (full stop followed by a space) is good enough for me.

 

I would appreciate if somebody could point me in the right direction as to a regular expression that would do this. I'm guessing preg-replace, but as the to regular expression I am clueless.

 

Thanks for everybody's input so far.

 

 

I see a problem. The last sentence probably won't have a space after the period.

Link to comment
Share on other sites

I'm happy for a semi-fool-proof solution. I think one which ends with a ". " (full stop followed by a space) is good enough for me.

 

I would appreciate if somebody could point me in the right direction as to a regular expression that would do this. I'm guessing preg-replace, but as the to regular expression I am clueless.

 

Thanks for everybody's input so far.

 

 

I see a problem. The last sentence probably won't have a space after the period.

 

Correct. I had thought about that but didn't include it in my first response as it was late. But, the real trick is not finding the search string within a sentence it is in determining what constitutes a sentence. The "best" solution I can think of is as follows:

 

The beginning of a sentence is either:

- The start of the entire text being processed

- A character that follows a period + 1 or more white-space characters (space, tab, line-break, etc.)

 

The end of a sentence is where the following comes after a beginning of a sentence:

- The end of the entire text being processed

- a period + space

 

And, I'm sure if I gave it more thought I would find potential flaws in that. I know you can use ^ and & in the regex for the start and end of the string, but that won't work for this. I think there are separate start/end characters that can be used for this, but I forget how they are implemented.

Link to comment
Share on other sites

Parsing this in a single expression would get really ugly. It's much 'simpler' to explode with one expression, and then use string functions to find a simple match, or another expression to find complex matches.

 

<?php

$string = "This is a string. We're going to find instances
of 'foo bar' and remove those sentences from the
string. I'm going to try this using odd spacing
as well. See if it can detect when 'foo
bar' are on different lines, but the same sentence.
We'd have to tweak it further if we wanted to match
HTML line-breaks though. It would probably be easiest
to simply convert them to normal line-breaks. Oh, yeah
'foo bar'";

function get_rid_of_some_sentences( $string, $containing ) {

// Convert a space to at least 1 whitespace character
$containing = str_replace(' ', '\s+', preg_quote($containing));

$parts = preg_split('/\.\s/', $string);

for( $i = 0, $max = count($parts); $i < $max; $i++ ) {
	if( preg_match('#'.$containing.'#i',$parts[$i]) )
		unset($parts[$i]);
}
return implode('. ', $parts);

}

echo get_rid_of_some_sentences($string, 'foo bar');

?>

 

Works okay. Let me know if there's anything that breaks it.

Link to comment
Share on other sites

OK, playing around with "Lookahead and Lookbehind Zero-Width Assertions" I was able to generate a regex expression that will do all this in one go. Looked like a fun project that would help me learn some more complex regex.

 

Using the same $string in Xyph's sample code I used this expression:

#(?<=.\s)[^.]*?foo[\s]*bar[^.]*\b.#is

 

To test I did a preg_match to verify what was being captured:

preg_match_all("#(?<=.\s)[^.]*?foo[\s]*bar[^.]*\b.#is", $string, $matches);

 

Here are the results:

Array
(
    [0] => Array
        (
            [0] => We're going to find instances of 'foo bar' and remove those sentences from the string.
            [1] => See if it can detect when 'foo bar' are on different lines, but the same sentence.
            [2] => Oh, yeah 'foo bar'
        )
)

 

So, going back to your requirement to remove all sentences that contain the target text you can do it in one line:

$new_string =  preg_replace("#(?<=.\s)[^.]*?foo[\s]*bar[^.]*\b.#is", "", $string);

 

EDIT: Assuming this needs to be done with target text that is variable, you would want to have some process to format that specific string before using in the regex. For example, for "foo bar" I replaced the [space] with "\s" to handle scenarios where there is a line break, tab, etc. between the words. To accomplish that with variable target text you could do a replacement on the target text first.

 

So here is a function that should work for most if not all your needs:

function removeSentencesWithText($sentence, $text)
{
    $text = preg_replace("#[\s]+#", "\s", $text);
    $regex = "#(?<=.\s)[^.]*?foo[\s]*bar[^.]*\b.#is";
    return preg_replace($regex, "", $sentence);
}

$new_string = removeSentencesWithText($string, 'foo bar');

Link to comment
Share on other sites

<?php

$string = "This is foo bar string. We're going to find instances
of 'foo bar' and remove those sentences from the
string. I'm going to try this using odd spacing
as well. See if it can detect when 'foo
bar' are on 1.5 different lines, but the same sentence.
We'd have to tweak it further if we wanted to match
HTML line-breaks though. It would probably be easiest
to simply convert them to normal line-breaks. Oh, yeah
'foo bar'";

echo preg_replace("#(?<=.\s)[^.]*?foo[\s]*bar[^.]*\b.#is", "", $string);

?>

 

This I'm going to try this using odd spacing as well. 5 different lines, but the same sentence. We'd have to tweak it further if we wanted to match HTML line-breaks though. It would probably be easiest to simply convert them to normal line-breaks.

 

It's much trickier than it seems. You example works just as well for my first test string if you omit

(?<=.\s)

You also have to escape dots to get them to register as dots... outside of character classes any ways.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.