Karl2019 Posted November 21, 2019 Share Posted November 21, 2019 Hi! I have a longer text, where I want to distinguish between textblocks that do contain a certain keyword, and that don't. smaple woiefjeowijji oj oiewjfoewijfoiwejfiojewf keyword owiejfioejoij oiewjfioewjf smaple smaple ojioewj fijo oieiojewf keyword owiejfioejoij oiewjfioewjf smaple smaple woiefjeowijji fijo oiewjfoewijfoiwejf owiejfioejoij oiewjfioewjf smaple 1. The textblocks I want to find start with "sample" and end with "sample. 2. The textblocks I want to find then start with "sample" and end with "sample anddo contain the "keyword". I just can't find the right regular expresstion. What should I use? Quote Link to comment https://forums.phpfreaks.com/topic/309563-looking-for-a-regex-to-find-multiline-text-blocks-that-a-may-or-b-may-not-contain-a-keyword/ Share on other sites More sharing options...
Psycho Posted November 21, 2019 Share Posted November 21, 2019 I'm not sure I follow what you are exactly trying to find. I see you gave some example input, it would have been helpful to see what you expect to be returned. Specifically, I'm not sure what you mean by "textblock". I *think* you mean where a line starts with 'sample' followed by however many lines until you find a line that ends with 'sample'. However, note that the word "sample" never appears in your text block. There is a word spelled "smaple" - I have no idea what that is. How you word #1 and #2 is confusing as well. Are you saying you want to find the first textblock which starts with "sample" and ends with "sample" and then find the NEXT textblock the same way, but the second one contains the keyword? Or does #2 mean the keyword is supposed to be in the first textblock? Quote Link to comment https://forums.phpfreaks.com/topic/309563-looking-for-a-regex-to-find-multiline-text-blocks-that-a-may-or-b-may-not-contain-a-keyword/#findComment-1571797 Share on other sites More sharing options...
Psycho Posted November 22, 2019 Share Posted November 22, 2019 This may work for you. Here is a function that returns the "textblocks" that begin/end with a delimiter string OR (if a keyword is provided, then it only returns "textblocks" that also contain that keyword. As I made the regular expressions programatical, it may be difficult to see how they are constructed. The two formats would look like this: #^smaple.*?smaple#ms and #^smaple.*?keyword.*?smaple#ms <?php $text = "smaple woiefjeowijji oj oiewjfoewijfoiwejfiojewf keyword owiejfioejoij oiewjfioewjf smaple smaple ojioewj fijo oieiojewf keyword owiejfioejoij oiewjfioewjf smaple smaple woiefjeowijji fijo oiewjfoewijfoiwejf owiejfioejoij oiewjfioewjf smaple"; function findTextBlocks($input, $delimiter, $keyword='') { if($keyword!='') { $keyword = "{$keyword}.*?"; } $pattern = "#^{$delimiter}.*?{$keyword}{$delimiter}#ms"; echo $pattern; preg_match_all($pattern, $input, $matches); return $matches; } $textBlocks = findTextBlocks($text, 'smaple'); echo "<pre>".print_r($textBlocks, true)."</pre>"; $textBlocksWithKeyword = findTextBlocks($text, 'smaple', 'keyword'); echo "<pre>".print_r($textBlocksWithKeyword, true)."</pre>"; ?> Output #1 Quote Array ( [0] => Array ( [0] => smaple woiefjeowijji oj oiewjfoewijfoiwejfiojewf keyword owiejfioejoij oiewjfioewjf smaple [1] => smaple ojioewj fijo oieiojewf keyword owiejfioejoij oiewjfioewjf smaple [2] => smaple woiefjeowijji fijo oiewjfoewijfoiwejf owiejfioejoij oiewjfioewjf smaple ) ) Output #2 Quote Array ( [0] => Array ( [0] => smaple woiefjeowijji oj oiewjfoewijfoiwejfiojewf keyword owiejfioejoij oiewjfioewjf smaple [1] => smaple ojioewj fijo oieiojewf keyword owiejfioejoij oiewjfioewjf smaple ) ) Quote Link to comment https://forums.phpfreaks.com/topic/309563-looking-for-a-regex-to-find-multiline-text-blocks-that-a-may-or-b-may-not-contain-a-keyword/#findComment-1571798 Share on other sites More sharing options...
Karl2019 Posted November 22, 2019 Author Share Posted November 22, 2019 Definitly I was not exact enough: Let's see another example: The whole text is made of multiline blocks that start with "start" and end with "end". Each textblocks has either 0,1 or many occurences of "keyword". In the first run I want to replace all textblocks that have 0 occurences of the "keyword". In the second run on the original whole text I want to remove all textblocks that have 1 or many occurences of the "keyword". start wofj keyword wopkefpwoekf end start oidfgoj pwefkoewfk end [and so on many many time] First run would remove the second textblock. Second run would remove the first textblock. Quote Link to comment https://forums.phpfreaks.com/topic/309563-looking-for-a-regex-to-find-multiline-text-blocks-that-a-may-or-b-may-not-contain-a-keyword/#findComment-1571812 Share on other sites More sharing options...
Psycho Posted November 22, 2019 Share Posted November 22, 2019 It's pretty frustrating when someone asks for help and then they change the requirements. It would also be helpful if you provided REAL content instead of something with gibberish. <?php $text = "start 00000000000000000 REMOVE ON FIRST PASS end start 11111111111111111111 XXXXXXX keyword XXXXXXXXXXX end start oidfgoj 11111111111111111111 keyword REMOVE ON 2ND PASS end start abcd keyword efg 2222222222222222222 REMOVE ON 2ND PASS abcd keyword abcd end SOME OTHER TEXT WILL NOT BE REPLACED start 0000000000000000000 REMOVE ON FIRST PASS end start keyword oidfgoj 33333333333333333333333 abcd REMOVE ON 2ND PASS abcd keyword fdsfs keyword pwefkoewfk end"; function replaceTextBlocks($input, $startDelimiter, $endDelimite, $keyword='keyword', $withKeyword=false) { if($keyword!='') { $keyword = "{$keyword}.*?"; } //$pattern = "#^{$startDelimiter}.*?{$keyword}{$endDelimite}#ms"; if(!$withKeyword) { //Remove blocks w/o the keyword $pattern = "#^{$startDelimiter}((?!{$keyword}).)*{$endDelimite}[\n\r]*#ms"; } else { //Remove blocks with the keyword $pattern = "#^{$startDelimiter}.*?{$keyword}.*?{$endDelimite}[\n\r]*#ms"; } //echo $pattern; $newText = preg_replace($pattern, '', $input); return $newText; } echo "Original text: <pre>{$text}</pre><br>\n"; $text = replaceTextBlocks($text, 'start', 'end', 'keyword'); echo "First Pass: <pre>{$text}</pre><br>\n"; $text = replaceTextBlocks($text, 'start', 'end', 'keyword', true); echo "Second Pass: <pre>{$text}</pre><br>\n"; ?> Quote Link to comment https://forums.phpfreaks.com/topic/309563-looking-for-a-regex-to-find-multiline-text-blocks-that-a-may-or-b-may-not-contain-a-keyword/#findComment-1571813 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.