Jump to content

PHP Glob, Readdir() and even Exec( ls ) miss files in a directory


Recommended Posts

Folks, I have been pulling my hair out over this one for a couple of weeks now. I'm running Apache 2.4 with PHP 8.0 on my test server at home, PHP 7.4 on our production server at work (tried updating to 8, several WordPress plugins choked -- that's for another post). Home is OpenSuse Linux, work is Red Hat Enterprise 7.

First oddity: even with two different PHP versions, I get the same error. We have an online purchase order system that stores each POR as a simple text file in a "key|value" format. The vertical line is the separator; I can use explode() to split the values. All of the files are stored in the same directory. All have the exact same permissions and ownership. To whit, with a simple "ls -al" command in a terminal:

-rw-rw-r-- 1 stephen www  618 May 30 09:51 101006
-rw-rw-r-- 1 stephen www  672 Jun 21 20:01 101007
-rw-rw-r-- 1 stephen www  947 May 30 09:51 101011
-rw-rw-r-- 1 stephen www  602 May 30 09:51 101012

The file name is simply the purchase order number. Here's the thing: running this little test program ...

$output = null;
$result = null;
$current = getcwd();
chdir( "/home/stephen/public_html/PORs/PORFiles" );
$cmdline = "ls 1*";
exec( $cmdline, $output, $result );
foreach( $output as $l )
{
    echo $l . "<br>";
}
chdir( $current );

It misses everything between 101106 and 101146.

101105
101106
101146
101147

That test program is calling "ls" in an attempt to understand. I can list all files in a text terminal, but not in a PHP program. Remember, the same thing happens with version 7.4 or 8.0. I've tried to glob it like so:

$dirlist = glob( "PORFiles/1*" );
$numitems = count( $dirlist );

It misses the exact same numbers. I've tried "scandir()"; it skips the exact same files. So does a "opendir()/readdir()/closedir()" riff. This is seriously weirding me out. Have I been invaded by aliens? Am I hallucinating? We first noticed this when the main page (which is a simple index) came up. This is taken from the actual Web page generated by a PHP script that uses "glob()."

101147	WDJC-FM	05/30/2023 ... [snip]
101146	WDJC-FM	05/30/2023
-- note the gap -- (the index displays in descending order)
101106	WDCX-FM	05/26/2023
101105	WDCX-FM 05/27/2023

If you want to try recreating this problem, build a simple PHP or Bash script that populates a directory with hundreds of files, all sequentially named as shown above. I have over 700 files in this directory, so you'll need to let your script run for a while.

when you used either scandir() or opendir()/readdir()/closedir(), did you you display everything or did your code attempt to conditionally display only files that started with '1'?

could the file names start with a non-printing/white-space character so that they don't actually begin with a '1' character, either when they were created or through some copy or rename operation?

10 hours ago, mac_gyver said:

when you used either scandir() or opendir()/readdir()/closedir(), did you you display everything or did your code attempt to conditionally display only files that started with '1'?

could the file names start with a non-printing/white-space character so that they don't actually begin with a '1' character, either when they were created or through some copy or rename operation?

Interesting. Let me check on that. Incidentally, I wrote a little Python gadget to do essentially the same thing and it does display all files, in a terminal at least. To answer your question, I *have* been using "1*" to filter out some other stuff in the folder. Maybe I should change that ...

To requinix: on my test server, about 800 files.

Thanks for responding, both of you. I'll keep you posted on this. This here's a good'un. 😁

1 minute ago, spoole said:

To requinix: on my test server, about 800 files.

800 isn't enough to run into the one or two PHP bugs I know of (which also have their own specific circumstances) but you should avoid putting everything into one directory on principle. Typically one partitions into ~a hundred or so per directory "tier".

Though really, you shouldn't be storing stuff in files like this to begin with.

1 hour ago, requinix said:

800 isn't enough to run into the one or two PHP bugs I know of (which also have their own specific circumstances) but you should avoid putting everything into one directory on principle. Typically one partitions into ~a hundred or so per directory "tier".

Though really, you shouldn't be storing stuff in files like this to begin with.

Well, the latter is a matter of opinion, but I'll take it under advisement. If you look at how much file activity a typical Wordpress site does, even with a database underlying it, you'll see why I say that. A typical busy commercial site can easily have multiple thousands of text and image files.

A little history: I never expected this system to be as heavily used as it has been. I was expecting maybe 20-30 purchase orders in a really heavy month; in the year that it has been in operation, the count is now over 1,000(!). I'm working on a complete rewrite that will use MariaDB to make it easier to do searches and changes. The problem is, I need this to work in the interim while I try to bring the New Improved version online!

Anyway, the plot thickens. I don't think it's PHP, I think it's something underlying that. Maybe even the OS. As proof, I wrote a Python program to re-index the files. A "cat" dump in a terminal shows the very same files missing. Same numbers, same span. One final puzzle is that my home server (OpenSuse) uses Btrfs, and the RHEL server at work uses XFS. Hard to blame it on the file system ...

Thanks for the replies. You've given me some pointers. If I figure it out, I'll come back and post a detailed answer in case anyone else runs across something like this.

BTW -- I did a bunch of Googlin' before I came here, and I saw some of those PHP bugs that you mentioned. But I really don't think they apply here, not if Python is experiencing the same thing.

 

Well ... when I  do something stupid, I own up to it. One problem with numerically-sequential file names is that it's easy to see something that isn't there.

Take a look at my original post. 101007 isn't the same as 101107. Not even close. While my eyes apparently don't work (I did just change my glasses), my math is fine: that's at least, oh, A HUNDRED different. So, I had a glitch in my "get next available number" code that must've jumped over the "missing" files.

Sigh. Whimper. False alarm.

I need new glasses. I'm an old man. Sorry for the ring. GET OFF MY LAWN. *Whimper again.* And stuff like that.

Just to help pay you guys back, I'll browse here and pitch in from time to time. Assuming I CAN SEE. Maybe.

  • Haha 1
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.