gerkintrigg Posted November 30, 2009 Share Posted November 30, 2009 What's the easiest way of counting specific words in a string? If I had a string like this: hear heard hearing heard <b>hear hear hear</b>hear and counted them using any of the popular methods like substr_count I'd end up with anything from 3 to 8. I want it to pick up (like a human would) ONLY the word "hear" and not "heard" or "hearing" but also to ensure that it works out when the tags are just before/after it and to pick up those words too... If I explode the string by a space, it won't pick up the words located right next to a tag. i tried using preg_match and preg_match_all but can't work out how to count results from the matching. Could someone please help? Quote Link to comment Share on other sites More sharing options...
Psycho Posted November 30, 2009 Share Posted November 30, 2009 I want it to pick up (like a human would) ONLY the word "hear" and not "heard" or "hearing"... ??? How are "heard" or "hearing" not words? If I was manually doing a word count I would count them as words. If you want code to not count different tenses of words you have a very, very long project on your hands. You will need to build an extensive dictionary of words and their tenses. As for the html tags, I would suggest using preg_replace to remove any tags before doing the word counts (and also use it to remove any multiple spaces). Then do an explode to create an array of each word in the string. Then use array_unique() to end up only with the unique words. You will then need to create a custom function to remove different tenses of words. Quote Link to comment Share on other sites More sharing options...
oni-kun Posted November 30, 2009 Share Posted November 30, 2009 Doesn't he mean just to match "[:space:]hear[:space:]" ? That basically is a reliable way, And you'd only need to define a few simple patterns such as (space)hear(?!) etc. Quote Link to comment Share on other sites More sharing options...
cags Posted November 30, 2009 Share Posted November 30, 2009 I want it to pick up (like a human would) ONLY the word "hear" and not "heard" or "hearing"... ??? How are "heard" or "hearing" not words? If I was manually doing a word count I would count them as words. If you want code to not count different tenses of words you have a very, very long project on your hands. You will need to build an extensive dictionary of words and their tenses. As for the html tags, I would suggest using preg_replace to remove any tags before doing the word counts (and also use it to remove any multiple spaces). Then do an explode to create an array of each word in the string. Then use array_unique() to end up only with the unique words. You will then need to create a custom function to remove different tenses of words. I believe the OP's objective is to simply count the number of instances of a specified word. In the example given that should be 'hear', which shouldn't match 'heard' or 'hearing'. It also shouldn't match anything inside a HTML tag such as '<div class="hear">', it just wasn't terribly well explained. To count them you should simply need to use preg_match_all (using the word boundary solution already discussed in at least one other thread with the OP) and then to simply use count to count the number of items returned. To ignore the contents of tags, the simplest solution would be as mjdamato said, to strip the tags first (using strip_tags or a Regular Expression). Quote Link to comment Share on other sites More sharing options...
salathe Posted November 30, 2009 Share Posted November 30, 2009 … then to simply use count to count the number of items returned. Or, just look at the return value from preg_match_all which is the number of matches found. Quote Link to comment Share on other sites More sharing options...
cags Posted November 30, 2009 Share Posted November 30, 2009 Well obviously if you wanted to do it the easy way... Quote Link to comment Share on other sites More sharing options...
gerkintrigg Posted December 1, 2009 Author Share Posted December 1, 2009 I'm using the following code to try to count the number of times "web" occurs in the string: $pattern='~\b'.$word.'\b(?![^<]*?>)~'; $string="websites on the web are cobwebs"; if($r['flagged']=='y'){ $style='flagged'; $plus = count(preg_match($pattern, strip_tags($my_page))); $_SESSION['flagged']=$_SESSION['flagged']+$plus; } I need to only count "web" and not "websites" or "cobwebs". I know I could explode the string based on spaces but I have good reasons why not to - perhaps too much to mention that here though. Quote Link to comment Share on other sites More sharing options...
salathe Posted December 1, 2009 Share Posted December 1, 2009 Given the $string and $word values, a basic method of counting occurrences of that word is like: $string = 'websites on the web are cobwebs'; $word = 'web'; $word_escaped = preg_quote($word, '~'); $pattern = '~\b' . $word_escaped . '\b~'; $count = preg_match_all($pattern, $string, $matches); echo "'$word' occurs $count time(s) in '$string'."; Quote Link to comment Share on other sites More sharing options...
gerkintrigg Posted December 1, 2009 Author Share Posted December 1, 2009 Excellent! that works fine. Thank you! I used similar code to work out the total word count but I had to divide it by 2 for some reason: <?php $pattern = '~\b\b(?![^<]*?>)~'; echo preg_match_all($pattern, $page, $matches)/2;?> I am curious, but if it works, that's all I'm interested in for the moment. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.