dpacmittal Posted September 14, 2009 Share Posted September 14, 2009 I am trying to use a regex to extract first 80 words to be used as excerpts on wordpress. I know of other ways, but I want to know how to achieve this using regexes. Quote Link to comment Share on other sites More sharing options...
Garethp Posted September 14, 2009 Share Posted September 14, 2009 <?php preg_match_all('~((?:\w\s?))~', $String, $Matches); foreach($Matches[1] as $k=>$v) { if($k > 79) { unset($Matches[1][$k]); } } $Matches = $Matches[1]; print_r($Matches); ?> Quote Link to comment Share on other sites More sharing options...
Adam Posted September 14, 2009 Share Posted September 14, 2009 Try this: if (preg_match('#([\w]+\s?){0,80}#', $str, $matches)) { print $matches[0]; } Edit: Actually this would run into problems with characters such as quotes, commas, dots, etc. hmm Quote Link to comment Share on other sites More sharing options...
Garethp Posted September 14, 2009 Share Posted September 14, 2009 That doesn't actually work. I forget why, but I remember reading that a pattern like that will only match the last word, and after testing it, that holds true Quote Link to comment Share on other sites More sharing options...
Adam Posted September 14, 2009 Share Posted September 14, 2009 Mine? I have a working test of it. Albeit simple text, i weren't thinking about punctuation at the time. <?php $str = 'one two three four five six seven eight nine ten one two three four five six seven eight nine ten one two three four five six seven eight nine ten'; if (preg_match('#([\w]+\s?){0,30}#', $str, $matches)) { print $matches[0]; } ?> Quote Link to comment Share on other sites More sharing options...
Garethp Posted September 14, 2009 Share Posted September 14, 2009 Oh! You're echoing the 0. Hahah, I was using this code <?php $str = 'one two three four five six seven eight nine ten one two three four five six seven eight nine ten one two three four five six seven eight nine ten'; preg_match('#([\w]+\s?){0,5}#', $str, $matches); print_r ($matches[1]); ?> It echoed the 1 instead, which is where it stuffed up Quote Link to comment Share on other sites More sharing options...
dpacmittal Posted September 14, 2009 Author Share Posted September 14, 2009 What I did was similar to Mr.Adam's solution. preg_match('!(\s*\S*){1,50}!', $str, $matches); var_dump($matches); That doesn't actually work. I forget why, but I remember reading that a pattern like that will only match the last word, and after testing it, that holds true True, but I can always use $matches[0] to get what I needed. I am average at regexes, can you explain your regex? Quote Link to comment Share on other sites More sharing options...
Garethp Posted September 14, 2009 Share Posted September 14, 2009 Mine works like Adams, it searches for all word characters, without limit. Then it runs through $Matches[1] and unsets all but the first 80 matches Quote Link to comment Share on other sites More sharing options...
.josh Posted September 14, 2009 Share Posted September 14, 2009 preg_match_all('~\S+~',$string,$matches); $newString = implode(' ',array_slice($matches[0],0,79)); echo $newString; Quote Link to comment Share on other sites More sharing options...
dpacmittal Posted September 14, 2009 Author Share Posted September 14, 2009 preg_match_all('~\S+~',$string,$matches); $newString = implode(' ',array_slice($matches[0],0,79)); echo $newString; This removes the newline and tabs as well, doesn't it? Quote Link to comment Share on other sites More sharing options...
.josh Posted September 14, 2009 Share Posted September 14, 2009 it matches anything that is not a space shortcut (\S) \S : space, tab, newline So in the end $newString will not have any tabs or newlines. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.