BagoZonde Posted February 11, 2013 Share Posted February 11, 2013 (edited) I'm looking for regex pattern to substring whole words even from middle of text. It should divide for space, tab or CR. Here's an simple example: $string="The quick brown fox jumps over the lazy dog"; print preg_replace($pattern, '', substr($string, 7, 15)); So in this example for "ck brown fox ju" string I want to get: brown fox Of course I'm aware about substr from 0 or to last character of string, but that's easy, I need just regex pattern. It looks like a common case, but I was trying myself with \b, \w, \s and other stuff then searching for net deeper, however I haven't found any solution yet. I appreciate any help, I'm tired with regex for today. I've written this function using iteration method with substr(), but I'm not satisfied, I'm looking for something more elegant, so I'm concerned to learn more about regex. Thanks! Edited February 11, 2013 by BagoZonde Quote Link to comment https://forums.phpfreaks.com/topic/274347-substring-whole-words-from-any-position/ Share on other sites More sharing options...
Christian F. Posted February 11, 2013 Share Posted February 11, 2013 The regular expression is actually the easy stuff for this request, the hard part comes when you're going to analyze the words against the English vocabulary. You have to figure our how a script, run by a computer who has absolutely 0 reading comprehension skills, can figure out what constitutes as a "valid" word in English. Regular expressions can only tell you whether or not any collection of characters follows the structure of what constitutes a "word", not if it's actually a word or just some random data that looks like one. Anyway, the RegExp you need is this: '/([a-z\\pL]+)/iu' Use that with preg_match_all () and you'll get everything that consists only of one or more letters. Quote Link to comment https://forums.phpfreaks.com/topic/274347-substring-whole-words-from-any-position/#findComment-1411784 Share on other sites More sharing options...
BagoZonde Posted February 13, 2013 Author Share Posted February 13, 2013 Thank you very much Christian, however it not meet my requirements but I'm on good track thanks to you. I need to break words if space, tab and CR as I mentioned in first post. Your code results with words divided for space character only. I get it with taking letters only, but I want to divide into array for space, tab and CR. So I need pattern which exclude exactly that. About commas, colons: it should be stick to word or digit as it is. Another example will tell you exactly what I'm looking for: The bed is a bundle of paradoxes: we go to it with reluctance, yet we quit it with regret; we make up our minds every night to leave it early, but we make up our bodies every morning to keep it late. Ogden Nash So I am writing a simple text search engine for easy purposes, so i.e. I was looking for "early" word. And now I want see results as a cutting, If word "early" was found on position 154 I want to take range -100 : +100. So I need only to cut part of this string to a words in array (with commas and other characters), then unset first and last word (as it would be not whole word) then implode with space character. Using \S I can explode words but I don't know how to explode CR (^\n and ^\r not works). I was trying with that one: preg_match_all('/([\S]+)/', $string, $words); And why results array is doubled? I found preg_split() so I think it could be better to focus on it in this case: $words = preg_split( "/(\s|\t)/", $string); However \n or \r or even \x0D or \x0D\x0A (chr(13).chr(10)) don't listen me in this OR statement. And I'm not sure about tab either. Thank you for interesting! Quote Link to comment https://forums.phpfreaks.com/topic/274347-substring-whole-words-from-any-position/#findComment-1412181 Share on other sites More sharing options...
Christian F. Posted February 13, 2013 Share Posted February 13, 2013 (edited) The regular expression I posted above does indeed work like requested: php > $string = "The bed is a bundle of paradoxes: we go to it with reluctance, php " yet we quit it with regret; php " we make up our minds every night to leave it early, php " but we make up our bodies every morning to keep it late. php " php " Ogden Nash"; //" php > preg_match_all ('/([a-z\\pL]+)/iu', $string, $matches); php > var_dump ($matches); array(2) { [0]=> array(44) { [0]=> string(3) "The" [1]=> string(3) "bed" [2]=> string(2) "is" [3]=> string(1) "a" [4]=> string(6) "bundle" [5]=> string(2) "of" [6]=> string(9) "paradoxes" [7]=> string(2) "we" [8]=> string(2) "go" [9]=> string(2) "to" [10]=> string(2) "it" [11]=> string(4) "with" [12]=> string(10) "reluctance" [13]=> string(3) "yet" [14]=> string(2) "we" [15]=> string(4) "quit" [16]=> string(2) "it" [17]=> string(4) "with" [18]=> string(6) "regret" [19]=> string(2) "we" [20]=> string(4) "make" [21]=> string(2) "up" [22]=> string(3) "our" [23]=> string(5) "minds" [24]=> string(5) "every" [25]=> string(5) "night" [26]=> string(2) "to" [27]=> string(5) "leave" [28]=> string(2) "it" [29]=> string(5) "early" [30]=> string(3) "but" [31]=> string(2) "we" [32]=> string(4) "make" [33]=> string(2) "up" [34]=> string(3) "our" [35]=> string(6) "bodies" [36]=> string(5) "every" [37]=> string(7) "morning" [38]=> string(2) "to" [39]=> string(4) "keep" [40]=> string(2) "it" [41]=> string(4) "late" [42]=> string(5) "Ogden" [43]=> string(4) "Nash" } // Snipped repeating array. Edited February 13, 2013 by Christian F. Quote Link to comment https://forums.phpfreaks.com/topic/274347-substring-whole-words-from-any-position/#findComment-1412264 Share on other sites More sharing options...
BagoZonde Posted February 13, 2013 Author Share Posted February 13, 2013 (edited) Unfortunately not because commas and semicolon are missing, haven't you noticed that? I want to cut some range from this string into words (with semicolons, etc.), then implode back to string with space characters as I want to display something like that: ...minds every night to leave it early, but we make up our bodies every morning... For now use of preg_split() working like a charm for my purposes, but I can't separate words if CR is between. It's just some cut of text, so if I'm looking for word "early" I want to see some part of context, something like Google engine searcher can do. Edited February 13, 2013 by BagoZonde Quote Link to comment https://forums.phpfreaks.com/topic/274347-substring-whole-words-from-any-position/#findComment-1412275 Share on other sites More sharing options...
Jessica Posted February 13, 2013 Share Posted February 13, 2013 Christian's code does work with commas, he has one "reluctance," Quote Link to comment https://forums.phpfreaks.com/topic/274347-substring-whole-words-from-any-position/#findComment-1412279 Share on other sites More sharing options...
BagoZonde Posted February 13, 2013 Author Share Posted February 13, 2013 (edited) Unfortunately I can't see any (even for rectulance) in his post (I was running that pattern on my server and that same was printed). And there's no "regret;", "early,", "late.". Edited February 13, 2013 by BagoZonde Quote Link to comment https://forums.phpfreaks.com/topic/274347-substring-whole-words-from-any-position/#findComment-1412288 Share on other sites More sharing options...
Jessica Posted February 13, 2013 Share Posted February 13, 2013 Try clicking the spoiler button. It's all there. Quote Link to comment https://forums.phpfreaks.com/topic/274347-substring-whole-words-from-any-position/#findComment-1412290 Share on other sites More sharing options...
Christian F. Posted February 13, 2013 Share Posted February 13, 2013 (edited) Jessica: It seems that he wants the punctuation a part of the results, not just the words. BagoZonde: As stated, matching the words is not the same mas matching the words and the punctuation. However, it is easy to remedy: Just add the punctuation you want to match in a character group after the "word" character group, and make it optional. Do take note that this will make it impossible to validate the words as proper English, unless you manually strip out the punctuation marks first. Again, contrary to what you desired according to your original post. Edited February 13, 2013 by Christian F. Quote Link to comment https://forums.phpfreaks.com/topic/274347-substring-whole-words-from-any-position/#findComment-1412292 Share on other sites More sharing options...
BagoZonde Posted February 13, 2013 Author Share Posted February 13, 2013 Hello Christian! Yes, I want words with punctuation. Subject of this thread is a short description of what I've written in first post as my target. I want to break if CR, space or tab. And I don't want to specify every sign as it's easier and safier to tell when to break. So I want to make "blacklist", not "whitelist". So, is there easy way to specify breaking string when CR, tab or space? I know how to break when space character ommit as I mentioned in my second post for this thread. But I have no idea how to include CR or tab too. Quote Link to comment https://forums.phpfreaks.com/topic/274347-substring-whole-words-from-any-position/#findComment-1412338 Share on other sites More sharing options...
BagoZonde Posted February 14, 2013 Author Share Posted February 14, 2013 (edited) Ok, it was easy, I found this pattern: $carriage=preg_split('/(\s|\t|\r)/', $string); Cheers! Edited February 14, 2013 by BagoZonde Quote Link to comment https://forums.phpfreaks.com/topic/274347-substring-whole-words-from-any-position/#findComment-1412390 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.