Jump to content

length mystery


dylansnow

Recommended Posts

I am having some mysterious returns when doing a count of a specific word in a file. Depending on the number I put in for length in this line $each_line = fgetss($the_page, 10000); There seems to be no determinable pattern for the length being changed and the count being returned.

 

Is there a magic number, a rule of thumb to use? I am new to php, but well versed in actionscript (if that makes a difference in your answer.) The entire code is below. It searched news sites for certain words and returns the number of times it appears.

 

Any explanation or insight to Length would be great.

 

<?
$search_criteria = "the";
$size = 0;
$url = "http://news.google.com/news?ned=tus&rec=0";
$the_page = fopen($url, "b");
while(!feof($the_page))
{
  $each_line = fgetss($the_page, 1050);
// strip_tags ($the_page);
// $each_line = ($the_page);
    if(eregi($search_criteria, $each_line, $results))
   {
      // for each line where there is a match, increment a counter
      $size++;
   }
}
fclose($the_page);
print("I found $size ocurrences of '$search_criteria' at $url");
?>

Link to comment
Share on other sites

I just gave the 1024*1024 a try.

 

I got 1 occurrences of the specific word.

 

I changed length to 50 I got 147 occurrences of the specific word.

 

What I don't understand is why it would get a higher number of hits when reading a lower number of bytes?

 

It don't make sense to me.

Link to comment
Share on other sites

I know exactly why.  Instead of reading the file line by line, fetch it with file_get_contents() and do substr_count() on it. o-O  Or explode it by newlines and run it through each line. 

 

Fyi, it doesn't work like that because eregi just returns a boolean of whether or not it actually found it.  Since you're essentially running the WHOLE FILE with 1024*1024, eregi only returns 1, so the loop only runs once, so $size is only equal to one.

Link to comment
Share on other sites

give this a try:

 

<?php
$search_criteria = "~the~";
$size = 0;
$url = "http://news.google.com/news?ned=tus&rec=0";
$the_page = fopen($url, "r");
$contents = fread($the_page, filesize($url));
$lines = explode("\n",$contents);
foreach($lines as $line){
preg_match_all($search_criteria, $line, $matches);
$size = $size + count($matches[0]);
}
fclose($the_page);
print("I found $size ocurrences of '$search_criteria' at $url");
?>

Link to comment
Share on other sites

DarkWater:

Thanks, I'll give that a try and post the results here.

 

The Little Guy:

I get errors with your code

 

Warning: filesize() [function.filesize]: stat failed for http://news.google.com/news?ned=tus&rec=0 in /home/dylansno/public_html/area51/test7.php on line 6

 

Warning: fread() [function.fread]: Length parameter must be greater than 0 in /home/dylansno/public_html/area51/test7.php on line 6

I found 0 ocurrences of '~the~' at http://news.google.com/news?ned=tus&rec=0

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.