Jump to content

Complex Word Counting (Python -> PHP)


Recommended Posts

Sorry for the huge post - big problem!

 

Hi guys. I'm trying to port a Python program to PHP which counts the number of words in a file - sounds simple? But it's not - it's counting a file in a programming language called RSL, which is to program robots. In this counting program, it needs to omit lines from counting where the line contains a certain command(s), and not count words after a comment indicator. Also, other characters count as words, but some don't.

The Python program is quite simple, but uses some annoyingly useful built in Pyton features that I really need, that in PHP is much more complex!

 

The idea is this:

1. A user writes a robot

2. He/she needs the number of words counted to see if it can be entered into a competition

3. He/she copy and pastes the robot into a form on a webpage

4. PHP counts the (correct) words and outputs it to the page

 

I don't know how to do this. I have tried, and failed. Here is the code which I was working on when I gave up:

<?php

$wordChars = '{}()[]+-*/%&^|=<>!\\';
$a_wordChars = array ( "{","}","(",")","[","]","+","-","*","/","%","&","^","|","=","<",">","!" );
$stopWords = array ('name', 'author', 'version', 'style');
$freeWords = array ('and', 'or', 'endif', 'endw');

$robot = $_POST['textarea'];

$code = explode("\n",$robot);

$linecnt = 0;

$t_wordcnt = 0;

$wordcnt = 0;

foreach($code as $line_num => $wline)
{
	$line = trim($wline);
	if(startsWith($line,"/") == 1 or startsWith($line,"#") == 1)
		continue;

	$t_wordcnt = str_word_count($line,0,$wordChars);
	//echo($line . " - " . $t_wordcnt . "<br />");

}

function startsWith($line, $del)
{
	$chars = str_split($line);
	$f_char = $chars[0];

	if($f_char == $del)
		return 1;
	else
		return 0;
}

?>

<form class="countForm" action="index.php" method="POST">
  <p>
    <textarea name="textarea" cols="100" rows="30" wrap="off"></textarea>
</p>
  <p>
    <input type="submit" name="Count" value="Count" />
</p>
</form>

 

I did get a bit further, but I deleted it when it got too complicated!

 

IMPORTANT!:

Things it should and shouldn't count:

The $freeWords array is an array of words which shouldn't be counted.

The $stopWords array is an array of words which indicate the rest of the line shouldn't be counted.

Anything inside a "..." string (in the robot) should be counted as one word

The $wordChars array is an array of characters which count as words

 

I thinks thats everything.

 

Obviously I dont want to put anyone to the trouble of coding the whole thing, but some suggestions as to possible routes to solutions would be nice!

 

Just ask for more info!

 

Thanks in advance and sorry for the massive post,

Tom

 

EDIT: You can wee what it does at the moment at botcount.qubesite.co.uk

Link to comment
Share on other sites

Sure:

def BotWords (code, verbose = False,
        wordChars = '{}()[]+-*/%&^|=<>!\xaa\xa5\xa4\xa3\x7c\x26',
stopWords = ['name', 'author', 'version', 'style'],
freeWords = ['and', 'or', 'endif', 'endw']):
if type (code) == type (' '):
	botFile = open (code, 'r')
	code = botFile.readlines ()
	botFile.close ()
wordTbl = string.maketrans (wordChars, ' ' * len (wordChars))
wordCnt = 0
for line in code:
                line = line.split ('#')[0]
                line = line.split ('//')[0]
	line = line.translate (wordTbl)
	line = line.split ()
	lineCnt = 0
	inString = 0
	finalWords = []
	for word in line:
		if word.lower () in stopWords:
			break
		elif word.endswith ('"'):
			lineCnt += 1
			inString = 0
			finalWords.append (word)
		elif word.startswith ('"'):
			inString = 1
		elif not inString and word and word.lower () not in freeWords:
			lineCnt += 1
			finalWords.append (word)
	wordCnt += lineCnt
	if verbose and finalWords:
		print (lineCnt, finalWords)
return wordCnt

Link to comment
Share on other sites

  • 1 month later...
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.