dylansnow Posted June 23, 2008 Share Posted June 23, 2008 I am having some mysterious returns when doing a count of a specific word in a file. Depending on the number I put in for length in this line $each_line = fgetss($the_page, 10000); There seems to be no determinable pattern for the length being changed and the count being returned. Is there a magic number, a rule of thumb to use? I am new to php, but well versed in actionscript (if that makes a difference in your answer.) The entire code is below. It searched news sites for certain words and returns the number of times it appears. Any explanation or insight to Length would be great. <? $search_criteria = "the"; $size = 0; $url = "http://news.google.com/news?ned=tus&rec=0"; $the_page = fopen($url, "b"); while(!feof($the_page)) { $each_line = fgetss($the_page, 1050); // strip_tags ($the_page); // $each_line = ($the_page); if(eregi($search_criteria, $each_line, $results)) { // for each line where there is a match, increment a counter $size++; } } fclose($the_page); print("I found $size ocurrences of '$search_criteria' at $url"); ?> Link to comment https://forums.phpfreaks.com/topic/111446-length-mystery/ Share on other sites More sharing options...
btherl Posted June 23, 2008 Share Posted June 23, 2008 Have you tried a length greater than the entire html result? I would try 1024*1024 (1MB). Link to comment https://forums.phpfreaks.com/topic/111446-length-mystery/#findComment-572073 Share on other sites More sharing options...
dylansnow Posted June 23, 2008 Author Share Posted June 23, 2008 I just gave the 1024*1024 a try. I got 1 occurrences of the specific word. I changed length to 50 I got 147 occurrences of the specific word. What I don't understand is why it would get a higher number of hits when reading a lower number of bytes? It don't make sense to me. Link to comment https://forums.phpfreaks.com/topic/111446-length-mystery/#findComment-572287 Share on other sites More sharing options...
DarkWater Posted June 23, 2008 Share Posted June 23, 2008 I know exactly why. Instead of reading the file line by line, fetch it with file_get_contents() and do substr_count() on it. o-O Or explode it by newlines and run it through each line. Fyi, it doesn't work like that because eregi just returns a boolean of whether or not it actually found it. Since you're essentially running the WHOLE FILE with 1024*1024, eregi only returns 1, so the loop only runs once, so $size is only equal to one. Link to comment https://forums.phpfreaks.com/topic/111446-length-mystery/#findComment-572293 Share on other sites More sharing options...
The Little Guy Posted June 23, 2008 Share Posted June 23, 2008 give this a try: <?php $search_criteria = "~the~"; $size = 0; $url = "http://news.google.com/news?ned=tus&rec=0"; $the_page = fopen($url, "r"); $contents = fread($the_page, filesize($url)); $lines = explode("\n",$contents); foreach($lines as $line){ preg_match_all($search_criteria, $line, $matches); $size = $size + count($matches[0]); } fclose($the_page); print("I found $size ocurrences of '$search_criteria' at $url"); ?> Link to comment https://forums.phpfreaks.com/topic/111446-length-mystery/#findComment-572315 Share on other sites More sharing options...
dylansnow Posted June 23, 2008 Author Share Posted June 23, 2008 DarkWater: Thanks, I'll give that a try and post the results here. The Little Guy: I get errors with your code Warning: filesize() [function.filesize]: stat failed for http://news.google.com/news?ned=tus&rec=0 in /home/dylansno/public_html/area51/test7.php on line 6 Warning: fread() [function.fread]: Length parameter must be greater than 0 in /home/dylansno/public_html/area51/test7.php on line 6 I found 0 ocurrences of '~the~' at http://news.google.com/news?ned=tus&rec=0 Link to comment https://forums.phpfreaks.com/topic/111446-length-mystery/#findComment-572344 Share on other sites More sharing options...
The Little Guy Posted June 23, 2008 Share Posted June 23, 2008 NVM Link to comment https://forums.phpfreaks.com/topic/111446-length-mystery/#findComment-572370 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.