Jump to content

A word count script


richardjh

Recommended Posts

I've been trying to come up with a way to accurately count a word string which includes punctuation marks. I've got close but the white-space is causing a problem. I have used a couple of functions such as str_replace and explode and I can now get an accurate count for texts with most punctuation and 'normal' white space. BUT.. If I put in three extra spaces between words the count adds 1 to the total.

 

$words2 = str_replace("-", "", $string); // strips a hypen

$words3 = str_replace('  ', ' ', $words2); // strips double spaces

$words = explode(' ', $words3);

 

This is obviously NOT a script - rather a string of text being passed through three functions to cleanse it. But as you can see it will strip two white spaces when they occur but no more. So three, four, five etc. will be counted as extra words.

 

What else could I try to give me an accurate count?

 

I am very raw at php.  :'(

 

thanks

 

Richard

Link to comment
Share on other sites

Well, it all depends on what you consider a word. Looking at what you have it appears that you want any group of characters that are separated by any number of spaces or hyphens to be considered words. You could use string functions or possibly regular expressions. But, as you saw with the string functions you can't (directly) account for when there are any number of multiple spaces. So, one solution would be to use a loop that continues indefinitely as long as there are any consecutive spaces.

 

function wordCount($string)
{
    //Replace hyphens with spaces
    $string = str_replace('-', '', $string);
    //As long as there are double spaces - replace with single spaces
    while(strpos($string, '  ')!==false)
    {
        $string = str_replace('  ', ' ', $string);
    }
    return count(explode(' ', trim($string)));
}

 

For some reason I think there has to be a built in function that would be more appropriate, but I can't think of it at the moment. Also, what about other "non printable" characters? Or, what if there are punctuation characters by themselves? If you want only alpha-numeric characters to be used as possible words, then a regex solution is probably better.

 

function wordCount($string)
{
    return preg_match_all("#\b[\w]+\b#", $string, $matches);
}

 

In the above, the characters a-z, A-Z, 0-9 and the underscore can make up words.

 

Link to comment
Share on other sites

Thank you for the quick replies and help.

 

I found that these three lines seem to be doing what I want:

 

$word1 = preg_replace('/\s+/', ' ', $text);

$word = explode(' ', $word1);

$words = count($word);

 

Using this I can put any amount of white space between words and the count remains the same (which I want).

 

I will test it a bit more though before getting my hopes up.  :)

 

 

Link to comment
Share on other sites

Try this, it converts 2+ spaces into 1 space. It will the split the words into an array, any empty item in the array was a punctuation mark.

 

<?php
header("content-type: text/plain");
$str = "this is    my string.  It is awesome!";
$str = preg_replace("/\s\s+/", " ", $str);

$arr = preg_split("/ |!|\./", $str);
print_r($arr);
echo $str;
?>

Link to comment
Share on other sites

If you are going to use regular expression, I already gave you a single line solution that works:

function wordCount($string)
{
    return preg_match_all("#\b[\w]+\b#", $string, $matches);
}

 

Or, if you don't want to use a function

$words = preg_match_all("#\b[\w]+\b#", $string, $matches);

 

As I stated above this counts anything that is a-z, A-Z, 0-9 and underscore as possible words. If you want to expand the list of characters (or instead use a black list) that is easy to modify as well.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.