Jump to content

length mystery


dylansnow

Recommended Posts

I am having some mysterious returns when doing a count of a specific word in a file. Depending on the number I put in for length in this line $each_line = fgetss($the_page, 10000); There seems to be no determinable pattern for the length being changed and the count being returned.

 

Is there a magic number, a rule of thumb to use? I am new to php, but well versed in actionscript (if that makes a difference in your answer.) The entire code is below. It searched news sites for certain words and returns the number of times it appears.

 

Any explanation or insight to Length would be great.

 

<?
$search_criteria = "the";
$size = 0;
$url = "http://news.google.com/news?ned=tus&rec=0";
$the_page = fopen($url, "b");
while(!feof($the_page))
{
  $each_line = fgetss($the_page, 1050);
// strip_tags ($the_page);
// $each_line = ($the_page);
    if(eregi($search_criteria, $each_line, $results))
   {
      // for each line where there is a match, increment a counter
      $size++;
   }
}
fclose($the_page);
print("I found $size ocurrences of '$search_criteria' at $url");
?>

Link to comment
https://forums.phpfreaks.com/topic/111446-length-mystery/
Share on other sites

I just gave the 1024*1024 a try.

 

I got 1 occurrences of the specific word.

 

I changed length to 50 I got 147 occurrences of the specific word.

 

What I don't understand is why it would get a higher number of hits when reading a lower number of bytes?

 

It don't make sense to me.

Link to comment
https://forums.phpfreaks.com/topic/111446-length-mystery/#findComment-572287
Share on other sites

I know exactly why.  Instead of reading the file line by line, fetch it with file_get_contents() and do substr_count() on it. o-O  Or explode it by newlines and run it through each line. 

 

Fyi, it doesn't work like that because eregi just returns a boolean of whether or not it actually found it.  Since you're essentially running the WHOLE FILE with 1024*1024, eregi only returns 1, so the loop only runs once, so $size is only equal to one.

Link to comment
https://forums.phpfreaks.com/topic/111446-length-mystery/#findComment-572293
Share on other sites

give this a try:

 

<?php
$search_criteria = "~the~";
$size = 0;
$url = "http://news.google.com/news?ned=tus&rec=0";
$the_page = fopen($url, "r");
$contents = fread($the_page, filesize($url));
$lines = explode("\n",$contents);
foreach($lines as $line){
preg_match_all($search_criteria, $line, $matches);
$size = $size + count($matches[0]);
}
fclose($the_page);
print("I found $size ocurrences of '$search_criteria' at $url");
?>

Link to comment
https://forums.phpfreaks.com/topic/111446-length-mystery/#findComment-572315
Share on other sites

DarkWater:

Thanks, I'll give that a try and post the results here.

 

The Little Guy:

I get errors with your code

 

Warning: filesize() [function.filesize]: stat failed for http://news.google.com/news?ned=tus&rec=0 in /home/dylansno/public_html/area51/test7.php on line 6

 

Warning: fread() [function.fread]: Length parameter must be greater than 0 in /home/dylansno/public_html/area51/test7.php on line 6

I found 0 ocurrences of '~the~' at http://news.google.com/news?ned=tus&rec=0

Link to comment
https://forums.phpfreaks.com/topic/111446-length-mystery/#findComment-572344
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.