johnsmith153 Posted June 20, 2012 Share Posted June 20, 2012 I have a text string (which I can't change). I need to remove from it an entire sentence where a provided value exists in that sentence somewhere. Example: $sentence = "The quick brown fox jumped over the lazy dog. This is a test. Thank you."; function remove_sentence($remove_value, $sentence) { //remove all sentences containing $value from $sentence return $new_sentence } remove_sentence("quick brown", $sentence); // returns 'This is a test. Thank you.' Can someone help me complete this? Quote Link to comment Share on other sites More sharing options...
trq Posted June 20, 2012 Share Posted June 20, 2012 str_replace. Quote Link to comment Share on other sites More sharing options...
johnsmith153 Posted June 20, 2012 Author Share Posted June 20, 2012 No, I need the whole sentence. So if you see my example, I provide a value and it removes all entire sentences with that value in it. I'm pretty sure str_replace won't do that. Quote Link to comment Share on other sites More sharing options...
Psycho Posted June 20, 2012 Share Posted June 20, 2012 That's a pretty tall order due to the complexity of what you are asking. You could simply use a regular expression to look for a segment of the string that contains the text where the segment begins the string or follows a period and then ends with a period. But, that is not fool-proof. Because if a sentence contains something like "$12.35" the decimal would be considered an ending period. You could tweak that some more to be where the segment ends in a period + space or the period is the last character in the string, but it still doesn't consider all possible scenarios. Quote Link to comment Share on other sites More sharing options...
johnsmith153 Posted June 20, 2012 Author Share Posted June 20, 2012 I'm happy for a semi-fool-proof solution. I think one which ends with a ". " (full stop followed by a space) is good enough for me. I would appreciate if somebody could point me in the right direction as to a regular expression that would do this. I'm guessing preg-replace, but as the to regular expression I am clueless. Thanks for everybody's input so far. Quote Link to comment Share on other sites More sharing options...
websoftexpert Posted June 20, 2012 Share Posted June 20, 2012 use strstr, if it returns true reset $sentance variable . thanks I have a text string (which I can't change). I need to remove from it an entire sentence where a provided value exists in that sentence somewhere. Example: $sentence = "The quick brown fox jumped over the lazy dog. This is a test. Thank you."; function remove_sentence($remove_value, $sentence) { //remove all sentences containing $value from $sentence return $new_sentence } remove_sentence("quick brown", $sentence); // returns 'This is a test. Thank you.' Can someone help me complete this? Quote Link to comment Share on other sites More sharing options...
johnsmith153 Posted June 20, 2012 Author Share Posted June 20, 2012 use strstr, if it returns true reset $sentance variable . thanks I have a text string (which I can't change). I need to remove from it an entire sentence where a provided value exists in that sentence somewhere. Example: $sentence = "The quick brown fox jumped over the lazy dog. This is a test. Thank you."; function remove_sentence($remove_value, $sentence) { //remove all sentences containing $value from $sentence return $new_sentence } remove_sentence("quick brown", $sentence); // returns 'This is a test. Thank you.' Can someone help me complete this? Well obviously that's not going to work. Thanks for trying though. Quote Link to comment Share on other sites More sharing options...
ionutvmi Posted June 20, 2012 Share Posted June 20, 2012 you could try this... after what you described it will do the job <?php $sentence = "The quick brown fox jumped over the lazy dog. This is a test. Thank you."; function remove_sentence($remove_value, $sentence) { $stc = explode(". ",$sentence); foreach($stc as $stc) { if(strpos($stc,$remove_value) === false) { $new_sentence.=$stc."."; } } // end for return $new_sentence; } echo remove_sentence("quick brown", $sentence); // returns 'This is a test. Thank you.' ?> Quote Link to comment Share on other sites More sharing options...
Jessica Posted June 20, 2012 Share Posted June 20, 2012 I'm happy for a semi-fool-proof solution. I think one which ends with a ". " (full stop followed by a space) is good enough for me. I would appreciate if somebody could point me in the right direction as to a regular expression that would do this. I'm guessing preg-replace, but as the to regular expression I am clueless. Thanks for everybody's input so far. I see a problem. The last sentence probably won't have a space after the period. Quote Link to comment Share on other sites More sharing options...
Psycho Posted June 20, 2012 Share Posted June 20, 2012 I'm happy for a semi-fool-proof solution. I think one which ends with a ". " (full stop followed by a space) is good enough for me. I would appreciate if somebody could point me in the right direction as to a regular expression that would do this. I'm guessing preg-replace, but as the to regular expression I am clueless. Thanks for everybody's input so far. I see a problem. The last sentence probably won't have a space after the period. Correct. I had thought about that but didn't include it in my first response as it was late. But, the real trick is not finding the search string within a sentence it is in determining what constitutes a sentence. The "best" solution I can think of is as follows: The beginning of a sentence is either: - The start of the entire text being processed - A character that follows a period + 1 or more white-space characters (space, tab, line-break, etc.) The end of a sentence is where the following comes after a beginning of a sentence: - The end of the entire text being processed - a period + space And, I'm sure if I gave it more thought I would find potential flaws in that. I know you can use ^ and & in the regex for the start and end of the string, but that won't work for this. I think there are separate start/end characters that can be used for this, but I forget how they are implemented. Quote Link to comment Share on other sites More sharing options...
xyph Posted June 20, 2012 Share Posted June 20, 2012 Parsing this in a single expression would get really ugly. It's much 'simpler' to explode with one expression, and then use string functions to find a simple match, or another expression to find complex matches. <?php $string = "This is a string. We're going to find instances of 'foo bar' and remove those sentences from the string. I'm going to try this using odd spacing as well. See if it can detect when 'foo bar' are on different lines, but the same sentence. We'd have to tweak it further if we wanted to match HTML line-breaks though. It would probably be easiest to simply convert them to normal line-breaks. Oh, yeah 'foo bar'"; function get_rid_of_some_sentences( $string, $containing ) { // Convert a space to at least 1 whitespace character $containing = str_replace(' ', '\s+', preg_quote($containing)); $parts = preg_split('/\.\s/', $string); for( $i = 0, $max = count($parts); $i < $max; $i++ ) { if( preg_match('#'.$containing.'#i',$parts[$i]) ) unset($parts[$i]); } return implode('. ', $parts); } echo get_rid_of_some_sentences($string, 'foo bar'); ?> Works okay. Let me know if there's anything that breaks it. Quote Link to comment Share on other sites More sharing options...
Psycho Posted June 21, 2012 Share Posted June 21, 2012 OK, playing around with "Lookahead and Lookbehind Zero-Width Assertions" I was able to generate a regex expression that will do all this in one go. Looked like a fun project that would help me learn some more complex regex. Using the same $string in Xyph's sample code I used this expression: #(?<=.\s)[^.]*?foo[\s]*bar[^.]*\b.#is To test I did a preg_match to verify what was being captured: preg_match_all("#(?<=.\s)[^.]*?foo[\s]*bar[^.]*\b.#is", $string, $matches); Here are the results: Array ( [0] => Array ( [0] => We're going to find instances of 'foo bar' and remove those sentences from the string. [1] => See if it can detect when 'foo bar' are on different lines, but the same sentence. [2] => Oh, yeah 'foo bar' ) ) So, going back to your requirement to remove all sentences that contain the target text you can do it in one line: $new_string = preg_replace("#(?<=.\s)[^.]*?foo[\s]*bar[^.]*\b.#is", "", $string); EDIT: Assuming this needs to be done with target text that is variable, you would want to have some process to format that specific string before using in the regex. For example, for "foo bar" I replaced the [space] with "\s" to handle scenarios where there is a line break, tab, etc. between the words. To accomplish that with variable target text you could do a replacement on the target text first. So here is a function that should work for most if not all your needs: function removeSentencesWithText($sentence, $text) { $text = preg_replace("#[\s]+#", "\s", $text); $regex = "#(?<=.\s)[^.]*?foo[\s]*bar[^.]*\b.#is"; return preg_replace($regex, "", $sentence); } $new_string = removeSentencesWithText($string, 'foo bar'); Quote Link to comment Share on other sites More sharing options...
xyph Posted June 21, 2012 Share Posted June 21, 2012 <?php $string = "This is foo bar string. We're going to find instances of 'foo bar' and remove those sentences from the string. I'm going to try this using odd spacing as well. See if it can detect when 'foo bar' are on 1.5 different lines, but the same sentence. We'd have to tweak it further if we wanted to match HTML line-breaks though. It would probably be easiest to simply convert them to normal line-breaks. Oh, yeah 'foo bar'"; echo preg_replace("#(?<=.\s)[^.]*?foo[\s]*bar[^.]*\b.#is", "", $string); ?> This I'm going to try this using odd spacing as well. 5 different lines, but the same sentence. We'd have to tweak it further if we wanted to match HTML line-breaks though. It would probably be easiest to simply convert them to normal line-breaks. It's much trickier than it seems. You example works just as well for my first test string if you omit (?<=.\s) You also have to escape dots to get them to register as dots... outside of character classes any ways. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.