torvald_helmer Posted April 10, 2007 Share Posted April 10, 2007 I want to count the number of documents that contains a specified term (word)? This is known as document frequency (DF). I have an array that contains a list of txt-files. Foreach of these files I read one, and foreach of the files I read each term (word). Inside these two loops I count how many times a term exist in one file, and I also remove multiple occurances of a term, so my result is an array of terms from a file (one occurance of each term), and an array with the number of times the word occur in the file. I also want to know in how many of the total number of files each of these terms exist? Can I do this inside the loop somehow? Or is there another smart way to solve this? Quote Link to comment Share on other sites More sharing options...
Barand Posted April 10, 2007 Share Posted April 10, 2007 this may be useful http://www.pgp.net/substr_count Quote Link to comment Share on other sites More sharing options...
torvald_helmer Posted April 10, 2007 Author Share Posted April 10, 2007 It didn't work quite as I want to. I have tried this, but it doesn't seem to work either: foreach($terms as $term) { /*array of terms from one file */ foreach($files as $file) { /*array of all files */ if(in_array($term, $file)) { $DF = 'increase variable by one for each time'; } } } Quote Link to comment Share on other sites More sharing options...
Barand Posted April 10, 2007 Share Posted April 10, 2007 Does the $files array contain filenames or the contents of the files? Quote Link to comment Share on other sites More sharing options...
torvald_helmer Posted April 10, 2007 Author Share Posted April 10, 2007 $files is the list of files, I use this to make sure I go through all files. $terms contain the all the words from one file Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.